Bulk renaming file names in Ubuntu/Linux with variable pattern

I am struggling to rename bunch of files with a variable pattern to be removed.

I have:

1B_ACTCGCTA-CCTAGAGT_L001_R1_001.fastq.gz
1B_ACTCGCTA-CCTAGAGT_L001_R2_001.fastq.gz

97C_TAAGGCGA-TTATGCGA_L001_R1_001.fastq.gz
97C_TAAGGCGA-TTATGCGA_L001_R2_001.fastq.gz

98A_S62_L001_R1_001.fastq.gz
98A_S62_L001_R2_001.fastq.gz

and want to have:

1B_R1_001.fastq.gz
1B_R2_001.fastq.gz

97C_R1_001.fastq.gz
97C_R2_001.fastq.gz

98A_R1_001.fastq.gz
98A_R2_001.fastq.gz

As you can see the pattern that needs to be dropped is variable and simple matching wont work. A logical workaround would be to exclude everything between the first and third underscore, or first underscore and letter "R". Unfortunately I am not able to come up with a code that would do that. It can be anything as long as it works, rename, bash for in loop, etc...

Appreciate your help, Deni

EDIT: I was trying to use for-loop but was not able to come up with a complete code to retain second part of the file name (everything that follows letter "R")

for file in *.fastq.gz; do echo mv "${file}" "${file/_*/\/}"; done

Upvotes: 1

Views: 386

Answers (3)

Uprooted
Uprooted

Reputation: 971

The following should work:

for f in *.fastq.gz; do echo mv "$f" "${f%%_*}_${f#*_*_*_}"; done

I specifically added echo before mv, so it prints what it would move. If it prints correctly remove echo and run again.

What happens here is I take head via %% and tail via # and concatenate them. See Parameter Expansion in man bash for meaning of %% and #. The solution relies on number of _ in file names being constant.

Upvotes: 2

Mark Setchell
Mark Setchell

Reputation: 207465

With (Perl) rename:

rename --dry-run 's/_.*_R/_R/' *gz

Sample Output

'1B_ACTCGCTA-CCTAGAGT_L001_R1_001.fastq.gz' would be renamed to '1B_R1_001.fastq.gz'
'1B_ACTCGCTA-CCTAGAGT_L001_R2_001.fastq.gz' would be renamed to '1B_R2_001.fastq.gz'
'97C_TAAGGCGA-TTATGCGA_L001_R1_001.fastq.gz' would be renamed to '97C_R1_001.fastq.gz'
'97C_TAAGGCGA-TTATGCGA_L001_R2_001.fastq.gz' would be renamed to '97C_R2_001.fastq.gz'
'98A_S62_L001_R1_001.fastq.gz' would be renamed to '98A_R1_001.fastq.gz'
'98A_S62_L001_R2_001.fastq.gz' would be renamed to '98A_R2_001.fastq.gz'

Upvotes: 2

Alex Stiff
Alex Stiff

Reputation: 904

Answer which doesn't rely on number of underscores:

for file in $(ls); do
    mv $file $(echo $file | awk -F _ 'BEGIN {OFS="_"} {print $1, $(NF-1), $NF}');
done

Upvotes: 1

Related Questions