Loourr
Loourr

Reputation: 5125

Bash Script which recursively makes all text in files lowercase

I'm trying to write a shell script which recursively goes through a directory, then in each file converts all Uppercase letters to lowercase ones. To be clear, I'm not trying to change the file names but the text in the files.

Considerations:

  1. This is an old Fortran project which I am trying to make more accessible
  2. I do not want to create a new file but rather write over the old one with the changes
  3. There are several different file extensions in this directory, including .par .f .txt and others

What would be the best way to go about this?

Upvotes: 3

Views: 1579

Answers (5)

gniourf_gniourf
gniourf_gniourf

Reputation: 46833

To convert a file from lower case to upper case you can use ex (a good friend of ed, the standard editor):

ex -s file <<EOF
%s/[[:upper:]]\+/\L&/g
wq
EOF

or, if you like stuff on one line:

ex -s file <<< $'%s/[[:upper:]]\+/\L&/g\nwq'

Combining with find, you can then do:

find . -type f -exec bash -c "ex -s -- \"\$0\" <<< $'%s/[[:upper:]]\+/\L&/g\nwq'" {} \;

This method is 100% safe regarding spaces and funny symbols in the file names. No auxiliary files are created, copied or moved; files are only edited.

Edit.

Using glenn jackmann's suggestion, you can also write:

find . -type f -exec bash -c 'printf "%s\n" "%s/[[:upper:]]\+/\L&/g" "wq" | ex -- -s "$0"' {} \;

(the pro is that it avoids awkward escapes; the con is that it's longer).

Upvotes: 6

nullrevolution
nullrevolution

Reputation: 4137

sed -e 's/\(.*\)/\L\1/g' *

or you could pipe the files in from find

Upvotes: 2

Nick Petersen
Nick Petersen

Reputation: 528

Expanding on @nullrevolution's solution:

find /path_to_files -type f -exec sed --in-place -e 's/\(.*\)/\L\1/g' '{}' \;

This one liner will look for all files in all sub-directories starting with /path_to_files as a base directory.

WARNING: This will change the case on ALL files in EVERY directory under */path_to_file*, so make sure you want to do that before you execute this script. You can limit the scope of the find based on file extensions by utilizing the following:

find /path_to_files -type f -name \*.txt -exec sed --in-place -e 's/\(.*\)/\L\1/g' '{}' \;

You may also want to make a backup of the original file before modifying the original:

find /path_to_files -type f -name *.txt -exec sed --in-place=-orig -e 's/(.*)/\L\1/g' '{}' \;

This will leave the original file name, while making an unmodified copy with the "_orig" appended to the file name (ie file.txt would become file.txt-orig).

An explanation of each piece:

find /path_to_file This will set the base directory to the path provided.

-type f This will search the directory hierarchy for files only.

-exec COMMAND '{}' \; This executes the provided command once for each matched file. The '{}' is replaced by the current file name. The \; indicates the end of the command.

sed --in-place -e 's/\(.*\)/\L\1/g' The --in-place will make the cnages to the file without backing up the file. The regular expression uses a backreference \1 to refer to the entire line and the \L to convert to lower case.

Optional

(For a more archaic solution.)

find /path_to_files -type f -exec dd if='{}' of='{}'-lc conv=lcase \;

Upvotes: 1

Mark Reed
Mark Reed

Reputation: 95267

Identifying text files can be a bit tricky in Unixlike environments. You can do something like this:

set -e -o noclobber
while read f; do
   tr 'A-Z' 'a-z' <"$f" >"f.$$"
   mv "$f.$$" "$f"
done < <(find "$start_directory" -type f -exec file {} + | cut -d: -f1)

This will fail on filenames with embedded colons or newlines, but should work on others, including those with spaces.

Upvotes: 0

Maxim Shoustin
Maxim Shoustin

Reputation: 77904

You can translate all uppercase characters (A–Z) to lowercase (a–z) using the tr command and specifying a range of characters, as in:

$ tr 'A-Z' 'a-z' <be.fore >af.ter

There is also special syntax in tr for specifying this sort of range for upper- and lowercase conversions:

$ tr '[:upper:]' '[:lower:]' <be.fore >af.ter

The tr utility copies the given input to produced the output with substitution or deletion of selected characters. tr abbreviated as translate or transliterate. It takes as parameters two sets of characters, and replaces occurrences of the characters in the first set with the corresponding elements from the other set i.e. it is used to translate characters.

tr "set1" "set2" < input.txt > output.txt

Although tr doesn't support regular expressions, hmm, it does support a range of characters.

Just make sure that both arguments end up with the same number of characters. If the second argument is shorter, its last character will be repeated to match the length of the first argument. If the first argument is shorter, the second argument will be truncated to match the length of the first.

Upvotes: 2

Related Questions