Reputation: 1
I have downloaded some 90 fasta files from NCBI for bacterial genomes. The downloaded files have default names given by NCBI. I need to change it to my desired file names. Thus I have created two .txt files:
file1.txt - having the default files names provided by NCBI. listed out the names provided by NCBI in file1.txt
file2.txt - having the names to replace the default. listed out the names to replace the NCBI names
Both the files are made in an order so that 1st entry of file1.txt is corresponding to 1st entry of file2.txt.
Now all the downloaded files are in a folder. the folder having the files
and I need a script which reads file1.txt, matches with the file name in the folder and replace it with the names in file2.txt.
I am not a bioinformatician, new to this genre. I look forward to your help. Can this process be made simpler?
Upvotes: 0
Views: 194
Reputation: 645
This can done with a very small awk one-liner. For convenience, lets first combine your file1 and file2 to make processing easier. This can be done with paste file1.txt file2.txt >> names.txt
.
names.txt
will be a text file with the old names in the first column and the new names in the second. Awk lets us conveniently run through a file line-by-line (or record-by-record in its terminology) and access each column/field.
Assuming you are in the directory with all these files, as well as names.txt
, you can simply run awk '{system("mv " $1 " " $2)}' names.txt
to transform them all. This will run through all the lines in names.txt
, take the filename given in the first column, and move it to the name given in the second column. The system
command allows you to access more basic file system operations through the shell, like moving mv
, copying cp
, or removing rm
files.
Upvotes: 2
Reputation: 12367
paste file1.txt file2.txt | xargs --verbose -n2 mv
The command is using paste
to write lines from 2 files side by side, separated by TABs, to STDOUT. The STDOUT is read by xargs
using a pipe (|
). Option --verbose
prints the command, and option -n2
specifies the max number of arguments for xargs
to be 2, so that the resulting commands that are executed are something like mv old_file new_file
.
Alternatively, use the Perl one-liners below.
Print the commands to rename the files, without executing the commands ("dry run"):
paste file1.txt file2.txt | perl -lane '$cmd = "mv $F[0] $F[1]"; print $cmd;'
Print the commands to rename the files, then actually execute them:
paste file1.txt file2.txt | perl -lane '$cmd = "mv $F[0] $F[1]"; print $cmd; system $cmd;'
The command is using paste
to write lines from 2 files side by side, separated by TABs, to STDOUT. The STDOUT is read by the Perl one-liner using a pipe (|
) to pass it to Perl one-liner's STDIN.
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("\n"
on *NIX by default) before executing the code in-line, and append it when printing.
-a
: Split $_
into array @F
on whitespace or on the regex specified in -F
option.
$F[0], $F[1]
: first and second elements of the array @F
into which the line is split. They are old and new file names, respectively.
system
executes the command $cmd
, which actually moves the files.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
Upvotes: 1