I have downloaded some 90 fasta files from NCBI for bacterial genomes. The downloaded files have default names given by NCBI. I need to change it to my desired file names. Thus I have created two .txt files: file1.txt - having the default files names provided by NCBI. listed out the names provided by NCBI in file1.txt file2.txt - having the names to replace the default. listed out the names to replace the NCBI names Both the files are made in an order so that 1st entry of file1.txt is corresponding to 1st entry of file2.txt. Now all the downloaded files are in a folder. the folder having the files and I need a script which reads file1.txt, matches with the file name in the folder and replace it with the names in file2.txt. I am not a bioinformatician, new to this genre. I look forward to your help. Can this process be made simpler?

Reputation: 1

Need command or script to rename a list of files in linux using a pattern match

I have downloaded some 90 fasta files from NCBI for bacterial genomes. The downloaded files have default names given by NCBI. I need to change it to my desired file names. Thus I have created two .txt files:

file1.txt - having the default files names provided by NCBI. listed out the names provided by NCBI in file1.txt
file2.txt - having the names to replace the default. listed out the names to replace the NCBI names

Both the files are made in an order so that 1st entry of file1.txt is corresponding to 1st entry of file2.txt.

Now all the downloaded files are in a folder. the folder having the files

and I need a script which reads file1.txt, matches with the file name in the folder and replace it with the names in file2.txt.

I am not a bioinformatician, new to this genre. I look forward to your help. Can this process be made simpler?

Upvotes: 0

Answers (2)

Tyberius

Reputation: 645

This can done with a very small awk one-liner. For convenience, lets first combine your file1 and file2 to make processing easier. This can be done with paste file1.txt file2.txt >> names.txt.

names.txt will be a text file with the old names in the first column and the new names in the second. Awk lets us conveniently run through a file line-by-line (or record-by-record in its terminology) and access each column/field.

Assuming you are in the directory with all these files, as well as names.txt, you can simply run awk '{system("mv " $1 " " $2)}' names.txt to transform them all. This will run through all the lines in names.txt, take the filename given in the first column, and move it to the name given in the second column. The system command allows you to access more basic file system operations through the shell, like moving mv, copying cp, or removing rm files.

Upvotes: 2

Timur Shtatland

Reputation: 12367

Use paste and xargs like so:

paste file1.txt file2.txt | xargs --verbose -n2 mv

The command is using paste to write lines from 2 files side by side, separated by TABs, to STDOUT. The STDOUT is read by xargs using a pipe (|). Option --verbose prints the command, and option -n2 specifies the max number of arguments for xargs to be 2, so that the resulting commands that are executed are something like mv old_file new_file.

Alternatively, use the Perl one-liners below.

Print the commands to rename the files, without executing the commands ("dry run"):

paste file1.txt file2.txt | perl -lane '$cmd = "mv $F[0] $F[1]"; print $cmd;'

Print the commands to rename the files, then actually execute them:

paste file1.txt file2.txt | perl -lane '$cmd = "mv $F[0] $F[1]"; print $cmd; system $cmd;'

The command is using paste to write lines from 2 files side by side, separated by TABs, to STDOUT. The STDOUT is read by the Perl one-liner using a pipe (|) to pass it to Perl one-liner's STDIN.

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.

$F[0], $F[1] : first and second elements of the array @F into which the line is split. They are old and new file names, respectively.

system executes the command $cmd, which actually moves the files.

Upvotes: 1

Need command or script to rename a list of files in linux using a pattern match

Answers (2)

Related Questions