UnixReview.com
May 2007

Shell Corner: Littera Delenda Est

Hosted by Ed Schaefer

In March, Sys Admin published our "Miscellaneous Unix Tips: Answering Novice Shell Questions" article. One of our tips described deleting a file named "-". Readers Leon Schutte, Stewart Ravenhall, Andy Bach, Robin Wakefield, and Royce Williams gently took us to task for not mentioning the judicious use of the dash with the rm command. Since John and I regret the oversight, I include the gist of their email:

Using -- indicates no further command options:

rm -- -

Using - explicitly marks the end of command line options so everything after is considered a file:

rm - -

Both commands work our Solaris 9 box. Again, we apologize.

Reader Royce Williams went on to state that his sys admin experience with file names with funny characters has left him scarred. As part of the healing experience, Royce presents Part 1 of "Removal of Files With Unusual Characters in Their Filenames".

Littera Delenda Est: On the Removal of Files with Unusual Characters in Their Filenames, Part One

by Royce Williams

To set the stage for this article, a little bit of history is in order.

In the 150s BC, the Roman statesman Cato was fed up with the neighboring city-state of Carthage. He was having trouble convincing his colleagues in the Roman Senate to authorize military action against Carthage, so he started wrapping up every speech that he made with some variation of the phrase Carthago delenta est ("Carthage must be destroyed").

Cato tried many different ways to say it: "And in conclusion, I encourage you to vote 'Yes' to preserve the olive tax ... and Carthage must be destroyed." "Gaul is pretty this time of year ... and Carthage must be destroyed." He kept at it until his fellow Senators gave in and declared war.

The lesson here is that some tasks call for both persistence and creativity. And here's where we get to the Unix part of the story.

In your daily sysadmin life, you may have encountered an irritating problem: Filenames with tricky characters in them that resist the usual deletion methods, either because they are hard to see, hard to type, or are interpreted by the shell as something other than a plain character.

However they are created - pressing a control key by accident, line noise, flaky terminals, pesky co-workers - some filenames are just odd enough that they need special handling to remove. Instead of resorting to the sysadmin version of slash-and-burn warfare (like moving all of the "good" files out of the way and then deleting the entire directory), I'd like to help you tackle the problem of with a little more grace, precision, and speed.

First, I show what kinds of characters you're likely to encounter. Then, I'll describe how to tell them apart (which can be trickier than you might suspect), demonstrate how to delete them (and how not to delete them), and even touch on how to create filenames that use them.

Before we begin, a word of caution. The deletion of odd characters is a complex dance among the display and deletion capabilities of your terminal, shell, locale, userland utilities, filesystem, and operating system. Larger deletions carry greater risk. While I have tried most of these methods on a number of platforms (see "Test Platforms" section at the end of this column) and a few different shells, there is no substitute for your own careful testing. Even if you are confident in your environment, you should still test your deletion methods in an inconspicuous area and as an unprivileged user. Your mileage may vary. Caveat deletor.

Know Your Enemy

Unusual characters that can appear in filenames fall into some broad categories:

Whitespace - Filenames that contain spaces or tabs.

Control characters - Cursor and terminal control characters fall into this category, including backspace, escape, and others.

Shell wildcards - The asterisk and the question mark are the usual suspects. Because they are automatically expanded to include all files that match, trying to delete a file called '*' in the middle of a directory full of important files can be intimidating.

Other shell characters - The hyphen/dash, the ampersand, the tilde, etc. all have special shell meanings that can make deletion go awry.

8-bit ASCII characters (above 127) - Most of the characters above 127 are from extended character sets for which there is no easily discoverable keystroke on some keyboards. (Filesystems that support multibyte (Unicode, etc.) characters are beyond the scope of this column, but may be interesting for future work.)

Special filesystem characters - The slash and the dot (when standing alone) and the NUL character (ASCII 0 / hex 0x0) are tricky for filenames because they have special uses for most Unix-like filesystems. If you have a file named just '/' or '.', or if there is a NUL in a directory entry somewhere other than as a separator between entries, then your filesystem has been manipulated or is otherwise corrupted. (Dealing with these characters is outside the scope of this column, primarily because it would take a sector editor that groks and writes to Unix-like filesystems to simulate the problem, and I don't know of any.) Note also that if you're running Cygwin, you'll inherit the colon from NTFS as a similarly restricted character.

As we'll see, these different families of characters present different challenges to you, our intrepid deleter.

Reconnaissance

The best way to teach the various approaches we can use to delete bad characters is by example. Let's imagine that you are roaming around your filesystem, minding your own business, when you discover that some strange new files have appeared in your projects directory:

admin@unixlike$ whoami
admin
admin@unixlike$ pwd
/home/admin/projects
admin@unixlike$ /bin/ls -lA
total 52
-rw-r-----  1 admin  admin   22 May  5  2007 ?
-rw-r-----  1 admin  admin   27 May  5  2007 ?
-rw-r-----  1 admin  admin   20 May  5  2007 ?
-rw-r-----  1 admin  admin   26 May  5  2007 ?
-rw-r-----  1 admin  admin   32 May  5  2007 ?
-rw-r-----  1 admin  admin   23 May  5  2007  
-rw-r-----  1 admin  admin   32 May  5  2007  ? 
-rw-r-----  1 admin  admin   23 May  5  2007    
-rw-r-----  1 admin  admin   22 May  5  2007 !
-rw-r-----  1 admin  admin   29 May  5  2007 !deleteme
-rw-r-----  1 admin  admin   29 May  5  2007 !echo
-rw-r-----  1 admin  admin   29 May  5  2007 !keeper
-rw-r-----  1 admin  admin   27 May  5  2007 &
-rw-r-----  1 admin  admin   26 May  5  2007 *
-rw-r-----  1 admin  admin   22 May  5  2007 -
-rw-r-----  1 admin  admin   29 May  5  2007 -keeper
-rw-r-----  1 admin  admin   22 May  5  2007 0
-rw-r-----  1 admin  admin   29 May  5  2007 >
-rw-r-----  1 admin  admin   30 May  5  2007 ?
-rw-r-----  1 admin  admin   29 May  5  2007 A
-rw-r-----  1 admin  admin   29 May  5  2007 Z
-rw-r-----  1 admin  admin   29 May  5  2007 \
-rw-r-----  1 admin  admin   29 May  5  2007 a
-rw-r-----  1 admin  admin   29 May  5  2007 z
-rw-r-----  1 admin  admin   23 May  5  2007 ~
-rw-r-----  1 admin  admin   29 May  5  2007 ~keep-me-too
-rw-r-----  1 admin  admin   21 May  5  2007 ?

Imagine further that you have some Very Important Project Files that you need to keep, as follows:

!keeper
-keeper
0 
A
Z 
a 
z
~keep-me-too

Some oddities should immediately stand out: There are filenames that are completely invisible, filenames that are not left-flush against the date column, and filenames with some characters that should just generally make experienced shell users uneasy.

There are also multiple files that appear to be named '?'. Since most filesystems don't allow duplicate filenames, something else must be going on. On Linux and the BSDs, this is the way that /bin/ls tries to sensibly present some otherwise unprintable characters.

This may sound simple enough, but some detailed explanation is warranted. Even though it's making our deletion job here potentially harder, this character obfuscation is actually a Good Thing. It suppresses those character sequences that might otherwise do some surprising things to your terminal.

On many Solaris and HP-UX systems, for example, the stock /bin/ls tries to represent unprintable characters in filenames a little too faithfully. Listing our directory above on one of these systems would briefly flash some directory listing information, emit a beep, clear your screen, and then display:

admin@flare$ /bin/ls -lA
[beep]
[screen clears]

-rw-r-----   1 admin    admin        32 May  5  2007
-rw-r-----   1 admin    admin        23 May  5  2007
-rw-r-----   1 admin    admin        29 May  5  2007
-rw-r-----   1 admin    admin        29 May  5  2007
-rw-r-----   1 admin    admin        22 May  5  2007 !
-rw-r-----   1 admin    admin        29 May  5  2007 !deleteme
     [ ... some identical output omitted ... ]
-rw-r-----   1 admin    admin        21 May  5  2007 ¥

This faithful reproduction is a double-edged sword: On the one hand, it makes telling whitespace filenames and some unprintable characters pretty difficult to tell apart (see the first two entries); on the other hand, it has revealed some potentially useful information about the last character in the directory. We'll see more about this one later.

Fortunately, the alternate, Berkeley-derived version of ls available on most Suns as /usr/ucb/ls will give you exactly the same output that the other systems's /bin/ls does. (Unfortunately, there is no similar native HP-UX alternative that I am aware of.)

It's handy to know ls provides this protection only if writing directly to standard out. To suppress the escaping of these characters, you can just pipe your ls through cat to get the unfiltered characters in all their glory.

Now that we've seen the enemy, let's engage. (Insert your favorite Picard joke here.)

Battle Simulation

The examples that I present here demonstrate multiple attack vectors. As we tackle each character in turn, the size of your toolbox should expand. If your situation is mentioned but the methods listed don't work for your flavor of Unix, keep reading. One of the later methods may do the trick.

Let's start with the hyphen or "dash".

admin@unixlike$ /bin/ls -lA -
-rw-r-----  1 admin  admin  22 May  5  2007 -

Couldn't we just ... well ... delete it? Generally speaking, yes:

admin@unixlike$ /bin/ls -lA -
-rw-r-----  1 admin  admin  22 May  5  2007 -
admin@unixlike$ rm -
admin@unixlike$ 

But that is where we would have run into our first snag on some Unix-likes. On some flavors (Solaris, for example), rm - would have given this response instead:

admin@flare$ rm -
usage: rm [-fiRr] file ...

Let's discuss what we could have done if we weren't under some specific restrictions to preserve the other files in the directory.

On the one hand, if the - was the only file in this directory with a length of one character (or if you wanted to hit the problem with a sledgehammer), then this approach would have been fine:

admin@unixlike$ /bin/ls -lA -
-rw-r-----  1 admin  admin  22 May  5  2007 -
admin@unixlike$ rm ?
admin@unixlike$ 

... but this would remove most of other files in our directory, too.

On the other hand, if we didn't have to keep -keeper, we could just remove all of the files that start with "-" by using a wildcard:

admin@unixlike$ /bin/ls -lA -
-rw-r-----  1 admin  admin  22 May  5  2007 -
admin@unixlike$ rm -*
admin@unixlike$ 

... but this would take our aptly named -keeper with it.

On the gripping hand, are a couple of better general solutions. It's handy to know that many shells support using two hyphens (--) to signal that you have finished with passing parameters to a command and are now ready to start giving the command filenames or other arguments to actually operate on. Using this method, you can easily remove files that begin with a hyphen without having to use wildcards or rely on other unique criteria of the file, and it even works on Solaris:

admin@unixlike$ /bin/ls -lA -*
-rw-r-----  1 admin  admin  22 May  5  2007 -
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper
admin@unixlike$ rm -- -
admin@unixlike$ /bin/ls -lA -*
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper

There's another way that you can tackle the hyphen: By using a full or relative path to the file (so that the shell knows that you are referring to a file and not to a flag):

admin@unixlike$ /bin/ls -lA -*
-rw-r-----  1 admin  admin  22 May  5  2007 -
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper
admin@unixlike$ rm ./-
admin@unixlike$ /bin/ls -lA -*
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper

Using find amounts to the same thing, because each result in find's output will be rooted in whatever path you specified as the starting point for the search:

admin@unixlike$ pwd
/home/admin/projects
admin@unixlike$ find /home/admin/projects -name '-' | xargs echo
/home/admin/projects/-
admin@unixlike$ find . -name '-' | xargs echo
./-
admin@unixlike$ find . -name '-' | xargs rm
admin@unixlike$ 

Escape is the New Attack

The last few methods I've demonstrated can be used to sidestep some of the other problems that arise when using some of our characters, especially shell special characters. Escaping helps, but there are other forces at work. Let's run through a few of them to see how they work.

As you may know, you can also escape our hyphen with a backslash (\):

admin@unixlike$ /bin/ls -lA -*
-rw-r-----  1 admin  admin  22 May  5  2007 -
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper
admin@unixlike$ rm \-
admin@unixlike$ /bin/ls -lA -*
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper

If you have any experience with wildcards, you already know why attempting to remove a file that's actually named "?" (the question mark) would be ill-advised, as it will remove all of your one-character-long filenames:

### Don't do this:
admin@unixlike$ rm ?
admin@unixlike$ /bin/ls -lA
total 12
-rw-r-----  1 admin  admin  29 May  5  2007 !deleteme
-rw-r-----  1 admin  admin  29 May  5  2007 !echo
-rw-r-----  1 admin  admin  29 May  5  2007 !keeper
-rw-r-----  1 admin  admin  29 May  5  2007 -keeper
-rw-r-----  1 admin  admin  29 May  5  2007 ~keep-me-too

... or, worse:

### Don't do this, either:
admin@unixlike$ rm *
admin@unixlike$ /bin/ls -lA
admin@unixlike$

For both shell wildcards, escaping them with backslash is the way to go. To make sure that your escaping is working correctly, you can test the syntax with a quick ls command before you rm:

admin@unixlike$ /bin/ls -lA \* \?
-rw-r-----  1 admin  admin   26 May  5  2007 *
-rw-r-----  1 admin  admin   30 May  5  2007 ?
admin@unixlike$ rm \*
admin@unixlike$ rm \?
admin@unixlike$ 

The tilde (~) is expanded by some shells as your home directory (or whatever is in the HOME environment variable). Attempts to remove it will have predictable results (from refusal as an ordinary user to some potentially wide-sweeping results as root):

admin@unixlike$ echo $HOME
/home/admin
admin@unixlike$ echo ~
/home/admin
admin@unixlike$ rm ~
rm: cannot remove `/home/admin': Permission denied
admin@unixlike$ rm \~
admin@unixlike$ 

or:

admin@unixlike$ rm ./~
admin@unixlike$ 

The ampersand (&) is used to start the current command as a job in the background, returning the command shell to you immediately. Trying rm & will get you the same results as executing rm with no arguments, only with some unexpected output:

admin@unixlike$ rm &
[1] 13347
admin@unixlike$ usage: rm [-f|-i] [-dPRrvW] file ...

[1]+  Exit 1                  rm
admin@unixlike$ rm ./&
[1] 16328
admin@unixlike$ rm: cannot remove `./': Is a directory

[1]+  Exit 1                  rm ./
admin@unixlike$ rm \&
admin@unixlike$ 

The backslash (\) itself usually just needs to be escaped by another backslash. Without it, many shells will think that you want to continue the command on another line, will change the prompt to indicate that they need more input, and then will wait patiently (forever, or until the next reboot):

#### bash, ksh
admin@unixlike$ rm \
> [we can press enter to complete the command]
[rm explains its usage as if we just typed "rm [enter]"]
admin@unixlike$ rm \\
admin@unixlike$

#### csh
% rm \
? [We can also press Control-C to interrupt the command - safer.]
% rm \\
%

#### tcsh
> rm \
? [press Control-C]
> rm \\
>

Note that using the prepended- path trick will not work for the backslash, because most shells interpret a trailing backslash as a continuation-of-command indicator, and use them to assemble the full command line before parsing the literal characters. We have to escape our backslash in order to delete it:

admin@unixlike$ rm /home/admin/projects/\
> [press Control-C]
admin@unixlike$ rm /home/admin/projects/\\
admin@unixlike$ 

or just:

admin@unixlike$ rm \\
admin@unixlike$ 

Doomed to repeat the past?

Time for another history lesson of sorts. The exclamation point or "bang" (!) is used by some shells to recall portions of your command history and substitute them into the current command. In an interesting twist, unlike many of our other bad tenants, you can easily remove the single-character filename "!".

admin@unixlike$ rm \!
admin@unixlike$

Deleting a longer filename that begins with a bang, however, takes some extra mojo and can have some unintended consequences unless you're careful. The single bang wields special power. With it, we can search our command history for the last command starting with that string and execute it. Notice that the output shows you what is about to be executed just before the actual running of the command:

admin@unixlike$ echo Z
Z
admin@unixlike$ !echo
echo Z
Z

Next, we explore what some deletions attempts might do. Since we haven't issued any commands that start with "deleteme", an unescaped attempt to delete it simply fails without doing any harm:

admin@unixlike$ rm !deleteme
-bash: !deleteme: event not found

But our recent "echo Z" command comes back to bite us:

admin@unixlike$ rm !echo
rm echo Z
rm: echo: No such file or directory
admin@unixlike$ /bin/ls -lA Z
ls: Z: No such file or directory

Sadly, we've deleted our must-keep Z file. Here's how we should have done it (with escapes):

admin@unixlike$ rm \!deleteme
admin@unixlike$ rm \!echo
admin@unixlike$ 

Misdirection

Just like with the backslash, a filename ending with one of the other redirection or pipelining characters can cause the command to wait for input or otherwise get confused:

admin@unixlike$ rm >
-bash: syntax error near unexpected token `newline'
admin@unixlike$ rm \>

You can handle other redirection characters in similar fashion:

admin@unixlike$ rm <
-bash: syntax error near unexpected token `newline'
admin@unixlike$ rm \<

admin@unixlike$ rm |
> [press Control-C]
admin@unixlike$ rm \|
admin@unixlike$

To Fight Another Day

We've handled the first wave of attackers with relative ease, because we could identify them by sight. The characters that remain are much better at hiding themselves. Next month, we'll break out the night-vision goggles and track them down. I'll also give you some notes about all of the platforms used to test these methods and some handy references for further reading.

Royce Williams is a Unix-like systems administrator for an Alaskan telecommunications company. He was included in the package when they acquired the first Alaskan ISP. When not flushing bad characters to ground, Royce likes watching indie movies and trying to put FreeBSD on ancient hardware. He also has an Alaskan license plate problem. You can reach him at royce@tycho.org.