In March, Sys Admin published our "Miscellaneous Unix Tips: Answering Novice
Shell Questions" article. One of our tips described deleting a file named
-". Readers Leon Schutte, Stewart Ravenhall, Andy Bach,
Robin Wakefield, and Royce Williams gently took us to task for not
mentioning the judicious use of the dash with the rm command. Since John
and I regret the oversight, I include the gist of their email:
-- indicates no further command options:
rm -- -
- explicitly marks the end of command line options
so everything after is considered a file:
rm - -
Both commands work our Solaris 9 box. Again, we apologize.
Reader Royce Williams went on to state that his sys admin experience with file names with funny characters has left him scarred. As part of the healing experience, Royce presents Part 1 of "Removal of Files With Unusual Characters in Their Filenames".
Littera Delenda Est: On the Removal of Files with Unusual Characters in Their Filenames, Part One
by Royce Williams
To set the stage for this article, a little bit of history is in order.
In the 150s BC, the Roman statesman Cato was fed up with the neighboring city-state of Carthage. He was having trouble convincing his colleagues in the Roman Senate to authorize military action against Carthage, so he started wrapping up every speech that he made with some variation of the phrase Carthago delenta est ("Carthage must be destroyed").
Cato tried many different ways to say it: "And in conclusion, I encourage you to vote 'Yes' to preserve the olive tax ... and Carthage must be destroyed." "Gaul is pretty this time of year ... and Carthage must be destroyed." He kept at it until his fellow Senators gave in and declared war.
The lesson here is that some tasks call for both persistence and creativity. And here's where we get to the Unix part of the story.
In your daily sysadmin life, you may have encountered an irritating problem: Filenames with tricky characters in them that resist the usual deletion methods, either because they are hard to see, hard to type, or are interpreted by the shell as something other than a plain character.
However they are created - pressing a control key by accident, line noise, flaky terminals, pesky co-workers - some filenames are just odd enough that they need special handling to remove. Instead of resorting to the sysadmin version of slash-and-burn warfare (like moving all of the "good" files out of the way and then deleting the entire directory), I'd like to help you tackle the problem of with a little more grace, precision, and speed.
First, I show what kinds of characters you're likely to encounter. Then, I'll describe how to tell them apart (which can be trickier than you might suspect), demonstrate how to delete them (and how not to delete them), and even touch on how to create filenames that use them.
Before we begin, a word of caution. The deletion of odd characters is a complex dance among the display and deletion capabilities of your terminal, shell, locale, userland utilities, filesystem, and operating system. Larger deletions carry greater risk. While I have tried most of these methods on a number of platforms (see "Test Platforms" section at the end of this column) and a few different shells, there is no substitute for your own careful testing. Even if you are confident in your environment, you should still test your deletion methods in an inconspicuous area and as an unprivileged user. Your mileage may vary. Caveat deletor.
Unusual characters that can appear in filenames fall into some broad categories:
Whitespace - Filenames that contain spaces or tabs.
Control characters - Cursor and terminal control characters fall into this category, including backspace, escape, and others.
Shell wildcards -
The asterisk and the question mark are the usual suspects.
Because they are automatically expanded to include all files that
match, trying to delete a file called '
*' in the
middle of a directory full of important files can be intimidating.
Other shell characters - The hyphen/dash, the ampersand, the tilde, etc. all have special shell meanings that can make deletion go awry.
8-bit ASCII characters (above 127) - Most of the characters above 127 are from extended character sets for which there is no easily discoverable keystroke on some keyboards. (Filesystems that support multibyte (Unicode, etc.) characters are beyond the scope of this column, but may be interesting for future work.)
Special filesystem characters -
The slash and the dot (when standing alone) and the NUL character
(ASCII 0 / hex 0x0) are tricky for filenames because they
have special uses for most Unix-like filesystems. If you have
a file named just '
/' or '
.', or if
there is a NUL in a directory entry somewhere other than as
a separator between entries, then your filesystem has been
manipulated or is otherwise corrupted. (Dealing with these
characters is outside the scope of this column, primarily because
it would take a sector editor that groks and writes to Unix-like
filesystems to simulate the problem, and I don't know of any.)
Note also that if you're running Cygwin, you'll inherit the
colon from NTFS as a similarly restricted character.
As we'll see, these different families of characters present different challenges to you, our intrepid deleter.
The best way to teach the various approaches we can use to delete bad
characters is by example. Let's imagine that you are roaming around
your filesystem, minding your own business, when you discover that some
strange new files have appeared in your
admin@unixlike$ whoami admin admin@unixlike$ pwd /home/admin/projects admin@unixlike$ /bin/ls -lA total 52 -rw-r----- 1 admin admin 22 May 5 2007 ? -rw-r----- 1 admin admin 27 May 5 2007 ? -rw-r----- 1 admin admin 20 May 5 2007 ? -rw-r----- 1 admin admin 26 May 5 2007 ? -rw-r----- 1 admin admin 32 May 5 2007 ? -rw-r----- 1 admin admin 23 May 5 2007 -rw-r----- 1 admin admin 32 May 5 2007 ? -rw-r----- 1 admin admin 23 May 5 2007 -rw-r----- 1 admin admin 22 May 5 2007 ! -rw-r----- 1 admin admin 29 May 5 2007 !deleteme -rw-r----- 1 admin admin 29 May 5 2007 !echo -rw-r----- 1 admin admin 29 May 5 2007 !keeper -rw-r----- 1 admin admin 27 May 5 2007 & -rw-r----- 1 admin admin 26 May 5 2007 * -rw-r----- 1 admin admin 22 May 5 2007 - -rw-r----- 1 admin admin 29 May 5 2007 -keeper -rw-r----- 1 admin admin 22 May 5 2007 0 -rw-r----- 1 admin admin 29 May 5 2007 > -rw-r----- 1 admin admin 30 May 5 2007 ? -rw-r----- 1 admin admin 29 May 5 2007 A -rw-r----- 1 admin admin 29 May 5 2007 Z -rw-r----- 1 admin admin 29 May 5 2007 \ -rw-r----- 1 admin admin 29 May 5 2007 a -rw-r----- 1 admin admin 29 May 5 2007 z -rw-r----- 1 admin admin 23 May 5 2007 ~ -rw-r----- 1 admin admin 29 May 5 2007 ~keep-me-too -rw-r----- 1 admin admin 21 May 5 2007 ?
Imagine further that you have some Very Important Project Files that you need to keep, as follows:
!keeper -keeper 0 A Z a z ~keep-me-too
Some oddities should immediately stand out: There are filenames that are completely invisible, filenames that are not left-flush against the date column, and filenames with some characters that should just generally make experienced shell users uneasy.
There are also multiple files that appear to be named '
Since most filesystems don't allow duplicate filenames, something else
must be going on. On Linux and the BSDs, this is the way that
/bin/ls tries to sensibly present some otherwise unprintable
This may sound simple enough, but some detailed explanation is warranted. Even though it's making our deletion job here potentially harder, this character obfuscation is actually a Good Thing. It suppresses those character sequences that might otherwise do some surprising things to your terminal.
On many Solaris and HP-UX systems, for example, the stock
/bin/ls tries to represent unprintable characters in
filenames a little too faithfully. Listing our directory above
on one of these systems would briefly flash some directory listing
information, emit a beep, clear your screen, and then display:
admin@flare$ /bin/ls -lA [beep] [screen clears] -rw-r----- 1 admin admin 32 May 5 2007 -rw-r----- 1 admin admin 23 May 5 2007 -rw-r----- 1 admin admin 29 May 5 2007 -rw-r----- 1 admin admin 29 May 5 2007 -rw-r----- 1 admin admin 22 May 5 2007 ! -rw-r----- 1 admin admin 29 May 5 2007 !deleteme [ ... some identical output omitted ... ] -rw-r----- 1 admin admin 21 May 5 2007 ¥
This faithful reproduction is a double-edged sword: On the one hand, it makes telling whitespace filenames and some unprintable characters pretty difficult to tell apart (see the first two entries); on the other hand, it has revealed some potentially useful information about the last character in the directory. We'll see more about this one later.
Fortunately, the alternate, Berkeley-derived version of
available on most Suns as
/usr/ucb/ls will give you exactly
the same output that the other systems's
(Unfortunately, there is no similar native HP-UX alternative that I am
It's handy to know
ls provides this protection only if
writing directly to standard out. To suppress the escaping of these
characters, you can just pipe your
to get the unfiltered characters in all their glory.
Now that we've seen the enemy, let's engage. (Insert your favorite Picard joke here.)
The examples that I present here demonstrate multiple attack vectors. As we tackle each character in turn, the size of your toolbox should expand. If your situation is mentioned but the methods listed don't work for your flavor of Unix, keep reading. One of the later methods may do the trick.
Let's start with the hyphen or "dash".
admin@unixlike$ /bin/ls -lA - -rw-r----- 1 admin admin 22 May 5 2007 -
Couldn't we just ... well ... delete it? Generally speaking, yes:
admin@unixlike$ /bin/ls -lA - -rw-r----- 1 admin admin 22 May 5 2007 - admin@unixlike$ rm - admin@unixlike$
But that is where we would have run into our first snag on some
Unix-likes. On some flavors (Solaris, for example),
would have given this response instead:
admin@flare$ rm - usage: rm [-fiRr] file ...
Let's discuss what we could have done if we weren't under some specific restrictions to preserve the other files in the directory.
On the one hand, if the
- was the only file in this
directory with a length of one character (or if you wanted to hit
the problem with a sledgehammer), then this approach would have
admin@unixlike$ /bin/ls -lA - -rw-r----- 1 admin admin 22 May 5 2007 - admin@unixlike$ rm ? admin@unixlike$
... but this would remove most of other files in our directory, too.
On the other hand, if we didn't have to keep
we could just remove all of the files that start with
-" by using a wildcard:
admin@unixlike$ /bin/ls -lA - -rw-r----- 1 admin admin 22 May 5 2007 - admin@unixlike$ rm -* admin@unixlike$
... but this would take our aptly named
-keeper with it.
On the gripping hand, are a couple of better general solutions. It's
handy to know that many shells support using two hyphens (
to signal that you have finished with passing parameters to a
command and are now ready to start giving the command filenames or
other arguments to actually operate on. Using this method,
you can easily remove files that begin with a hyphen without having
to use wildcards or rely on other unique criteria of the file, and
it even works on Solaris:
admin@unixlike$ /bin/ls -lA -* -rw-r----- 1 admin admin 22 May 5 2007 - -rw-r----- 1 admin admin 29 May 5 2007 -keeper admin@unixlike$ rm -- - admin@unixlike$ /bin/ls -lA -* -rw-r----- 1 admin admin 29 May 5 2007 -keeper
There's another way that you can tackle the hyphen: By using a full or relative path to the file (so that the shell knows that you are referring to a file and not to a flag):
admin@unixlike$ /bin/ls -lA -* -rw-r----- 1 admin admin 22 May 5 2007 - -rw-r----- 1 admin admin 29 May 5 2007 -keeper admin@unixlike$ rm ./- admin@unixlike$ /bin/ls -lA -* -rw-r----- 1 admin admin 29 May 5 2007 -keeper
find amounts to the same thing, because each result in
find's output will be rooted in whatever path you specified
as the starting point for the search:
admin@unixlike$ pwd /home/admin/projects admin@unixlike$ find /home/admin/projects -name '-' | xargs echo /home/admin/projects/- admin@unixlike$ find . -name '-' | xargs echo ./- admin@unixlike$ find . -name '-' | xargs rm admin@unixlike$
The last few methods I've demonstrated can be used to sidestep some of the other problems that arise when using some of our characters, especially shell special characters. Escaping helps, but there are other forces at work. Let's run through a few of them to see how they work.
As you may know, you can also escape our hyphen with a backslash
admin@unixlike$ /bin/ls -lA -* -rw-r----- 1 admin admin 22 May 5 2007 - -rw-r----- 1 admin admin 29 May 5 2007 -keeper admin@unixlike$ rm \- admin@unixlike$ /bin/ls -lA -* -rw-r----- 1 admin admin 29 May 5 2007 -keeper
If you have any experience with wildcards, you
already know why attempting to remove a file that's actually named
?" (the question mark) would be ill-advised, as it
will remove all of your one-character-long filenames:
### Don't do this: admin@unixlike$ rm ? admin@unixlike$ /bin/ls -lA total 12 -rw-r----- 1 admin admin 29 May 5 2007 !deleteme -rw-r----- 1 admin admin 29 May 5 2007 !echo -rw-r----- 1 admin admin 29 May 5 2007 !keeper -rw-r----- 1 admin admin 29 May 5 2007 -keeper -rw-r----- 1 admin admin 29 May 5 2007 ~keep-me-too
... or, worse:
### Don't do this, either: admin@unixlike$ rm * admin@unixlike$ /bin/ls -lA admin@unixlike$
For both shell wildcards, escaping them with backslash is the way
to go. To make sure that your escaping is working correctly, you can
test the syntax with a quick
ls command before you
admin@unixlike$ /bin/ls -lA \* \? -rw-r----- 1 admin admin 26 May 5 2007 * -rw-r----- 1 admin admin 30 May 5 2007 ? admin@unixlike$ rm \* admin@unixlike$ rm \? admin@unixlike$
The tilde (
~) is expanded by some shells
as your home directory (or whatever is in the HOME environment variable).
Attempts to remove it will have predictable results (from refusal as an
ordinary user to some potentially wide-sweeping results as root):
admin@unixlike$ echo $HOME /home/admin admin@unixlike$ echo ~ /home/admin admin@unixlike$ rm ~ rm: cannot remove `/home/admin': Permission denied admin@unixlike$ rm \~ admin@unixlike$ or: admin@unixlike$ rm ./~ admin@unixlike$
The ampersand (
&) is used to start
the current command as a job in the background, returning the command
shell to you immediately. Trying
rm & will get
you the same results as executing
rm with no arguments,
only with some unexpected output:
admin@unixlike$ rm &  13347 admin@unixlike$ usage: rm [-f|-i] [-dPRrvW] file ... + Exit 1 rm admin@unixlike$ rm ./&  16328 admin@unixlike$ rm: cannot remove `./': Is a directory + Exit 1 rm ./ admin@unixlike$ rm \& admin@unixlike$
The backslash (
\) itself usually just
needs to be escaped by another backslash. Without it, many shells will
think that you want to continue the command on another line, will
change the prompt to indicate that they need more input, and then
will wait patiently (forever, or until the next reboot):
#### bash, ksh admin@unixlike$ rm \ > [we can press enter to complete the command] [rm explains its usage as if we just typed "rm [enter]"] admin@unixlike$ rm \\ admin@unixlike$ #### csh % rm \ ? [We can also press Control-C to interrupt the command - safer.] % rm \\ % #### tcsh > rm \ ? [press Control-C] > rm \\ >
Note that using the prepended- path trick will not work for the backslash, because most shells interpret a trailing backslash as a continuation-of-command indicator, and use them to assemble the full command line before parsing the literal characters. We have to escape our backslash in order to delete it:
admin@unixlike$ rm /home/admin/projects/\ > [press Control-C] admin@unixlike$ rm /home/admin/projects/\\ admin@unixlike$ or just: admin@unixlike$ rm \\ admin@unixlike$
Time for another history lesson of sorts. The exclamation point
or "bang" (
!) is used by some shells to recall portions of
your command history and substitute them into the current command. In
an interesting twist, unlike many of our other bad tenants, you can easily
remove the single-character filename "
admin@unixlike$ rm \! admin@unixlike$
Deleting a longer filename that begins with a bang, however, takes some extra mojo and can have some unintended consequences unless you're careful. The single bang wields special power. With it, we can search our command history for the last command starting with that string and execute it. Notice that the output shows you what is about to be executed just before the actual running of the command:
admin@unixlike$ echo Z Z admin@unixlike$ !echo echo Z Z
Next, we explore what some deletions attempts might do. Since we haven't
issued any commands that start with "
deleteme", an unescaped
attempt to delete it simply fails without doing any harm:
admin@unixlike$ rm !deleteme -bash: !deleteme: event not found
But our recent "
echo Z" command comes back to bite us:
admin@unixlike$ rm !echo rm echo Z rm: echo: No such file or directory admin@unixlike$ /bin/ls -lA Z ls: Z: No such file or directory
Sadly, we've deleted our must-keep
Z file. Here's how
we should have done it (with escapes):
admin@unixlike$ rm \!deleteme admin@unixlike$ rm \!echo admin@unixlike$
Just like with the backslash, a filename ending with one of the other redirection or pipelining characters can cause the command to wait for input or otherwise get confused:
admin@unixlike$ rm > -bash: syntax error near unexpected token `newline' admin@unixlike$ rm \>
You can handle other redirection characters in similar fashion:
admin@unixlike$ rm < -bash: syntax error near unexpected token `newline' admin@unixlike$ rm \< admin@unixlike$ rm | > [press Control-C] admin@unixlike$ rm \| admin@unixlike$
We've handled the first wave of attackers with relative ease, because we could identify them by sight. The characters that remain are much better at hiding themselves. Next month, we'll break out the night-vision goggles and track them down. I'll also give you some notes about all of the platforms used to test these methods and some handy references for further reading.
Royce Williams is a Unix-like systems administrator for an Alaskan telecommunications company. He was included in the package when they acquired the first Alaskan ISP. When not flushing bad characters to ground, Royce likes watching indie movies and trying to put FreeBSD on ancient hardware. He also has an Alaskan license plate problem. You can reach him at firstname.lastname@example.org.