Hari Sundararajan
Hari Sundararajan

Reputation: 678

Bash scripts to change file names

I have a huge list of mp3 files, the names of which I want to process neatly and efficiently.

First, I want to remove spaces in file names. I am using tr -d ' '. Is there any situation where this might fail?

Second, a lot of file names start with a number or a bunch of numbers. For instance, 01-filename.mp3 or 02_file.mp3 etc. I tried using tr -d [:digit:] but the 3 from mp3 goes away and the file becomes _file.mp How do I resolve this?

Along similar lines, I have another question. When using sed to make modifications, how do I refer to "rest of the string"? For instance, my first thought was to a regular expression like ^[0-9] to refer to "starts with a number" but then I was stuck. How do I say "anything that (a) starts with a number (b) number repeated many times (c) rest of string" -> replace with (c) rest of string ?

Upvotes: 3

Views: 2530

Answers (4)

Brian Agnew
Brian Agnew

Reputation: 272287

Have you considered the Linux rename command ?

Upvotes: 1

Cameron
Cameron

Reputation: 98746

I don't use tr often enough to be able to comment on tr -d ' ', but the rest of your problems can be solved using the right regex. As a matter of fact, if you're using sed, you can add a space-removing regex and eliminate the need for tr:

sed -r -e 's/ +//g' -e 's/^[0-9]*[_-]*(.+\.mp3)$/\1/I'

The -r option tells sed to turn on its extended mode, so that "new" features like the + modifier can be used. Each regex following a -e will be applied to each line in the order they are specified.

The first one substitutes one or more (+) spaces with nothing, for all matches (g), not just the first one.

The second regex matches any line starting with a number (square brackets indicate a set of characters to match, and the - indicates a range in the set). The * means "match zero or more of the preceding character". Round parentheses are used to "group" part of the match for later use. The .+ matches all the leftover characters, and the \.mp3 matches the extension of the filename (the . is escaped with a backslash since it normally means "any character", but we need a literal .). The \1 in the replacement string indicates the first (and only, in this case) group. Finally, the I modifier indicates that the match will be case-insensitive.

There are lots or regex tutorials online if you want to learn more. The Perl regular expression tutorial is particularly good (and most regex engines are largely Perl-compatible).

Upvotes: 0

Slartibartfast
Slartibartfast

Reputation: 1700

First, I want to remove spaces in file names. I am using tr -d ' '. Is there any situation where this might fail?

Certainly. What if you have two files that have identical names except for spaces? (one might well overwrite the other unintentionally, or you could have the rename fail). Also, dealing with filenames with spaces can be a challenge; you must remember to correctly quote them.

In response to your other issues, I would say that rather than modify existing names, you might consider building names from the ID3 tags inside the files, rather than keeping the filenames intact in any sense. You might try 'id3ren'

Upvotes: 0

Kevin Stricker
Kevin Stricker

Reputation: 17388

Something like this: (untested)

sed -e 's/^[0-9]\+\(.*\)$/\1/'

Basically,

  1. Use + for repeated one or more times.
  2. Bracket the "rest of string" match (.*) (This will match even if the rest of the string is empty, which would be bad in your case)
  3. Use a backreference to refer to the rest of string \1

Upvotes: 0

Related Questions