Renaming long file names in bulk

Question

I have file names like:

5_END_1033_ACAGTG_L002_R1_001.fastq.gz
5_END_1033_ACAGTG_L002_R2_001.fastq.gz
40_END_251_GTGAAA_L002_R1_001.fastq.gz
40_END_251_GTGAAA_L002_R2_001.fastq.gz

I want something like:

END_1033_R1.fastq.gz
END_1033_R2.fastq.gz
END_251_R1.fastq.gz
END_251_R2.fastq.gz

Are there good ways to rename these files in linux?

bedwyr · Accepted Answer

You could try using a loop to extract the important part of the filename:

for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); echo $newname; done

This will simply give you a new list of filenames. You can then move them:

for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); mv $file $newname; done

To break this down a little:

loop over the *.gz files
create a variable which strips out the unnecessary content from the name
move the file name to that new name

I expect there are better ways to do this, but it's what I came up with off the top of my head.

Test:

$ ls
40_END_251_GTGAAA_L002_R1_001.fastq.gz  40_END_251_GTGAAA_L002_R2_001.fastq.gz  5_END_1033_ACAGTG_L002_R1_001.fastq.gz  5_END_1033_ACAGTG_L002_R2_001.fastq.gz

$ for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); echo $newname; done
./40_END_251_R1.fastq.gz
./40_END_251_R2.fastq.gz
./5_END_1033_R1.fastq.gz
./5_END_1033_R2.fastq.gz

$ for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); mv $file $newname; done

$ ls
40_END_251_R1.fastq.gz  40_END_251_R2.fastq.gz  5_END_1033_R1.fastq.gz  5_END_1033_R2.fastq.gz

Note I'm doing this in bash 4.4.5

EDIT Given I'm not entirely sure which columns in the name are the most important, awk might work better:

for file in ./*.gz; do newname=$(echo $file | awk -F'_' '{print $2 "_" $3 "_" $6}' -); echo $newname; done

This will split the filename by _ and allow you to reference the columns you want using $X:

for file in ./*.gz; do newname=$(echo $file | awk -F'_' '{print $2 "_" $3 "_" $6}' -); mv $file "${newname}.fastq.gz"; done

Renaming long file names in bulk

Answers (1)

Related Questions