dragon951
dragon951

Reputation: 396

Combining replacement strings and regular expressions in GNU Parallel

I have a list of file paths of the format:

/data/nicotine_sensi/bam/9-2_box_1_S23_starAligned.sortedByCoord.out.bam
/data/nicotine_sensi/bam/9-2_box_3_S101_starAligned.sortedByCoord.out.bam
/data/nicotine_sensi/bam/9-3_box_1_S24_starAligned.sortedByCoord.out.bam
/data/nicotine_sensi/bam/9-3_box_3_S102_starAligned.sortedByCoord.out.bam

I want to input into a gnu parallel command so that both the predefined replacement strings and a perl or --plus replacement string operate at the same time, but I couldn't find a solution in the tutorials. Ideally, {/...} and {%_starAligned} would both work together to produce:

9-2_box_1_S23
9-2_box_3_S101
9-3_box_1_S24
9-3_box_3_S102

however, the closest I get is:

parallel --rpl '{..} s:/data/nicotine_sensi/bam/::;s:_starAligned.sortedByCoord.out.bam::' \
  echo {..} ::: $(ls $bam_dir/*.bam)

which is messy and not very portable for other directories.

Upvotes: 2

Views: 583

Answers (1)

Ole Tange
Ole Tange

Reputation: 33685

The definition of {/...} is:

s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::;

The definition of {%(.*)} is:

s/$$1$//;

So combined you could do:

echo /data/nicotine_sensi/bam/9-3_box_1_S24_starAligned.sortedByCoord.out.bam |
  parallel --rpl '{¤([^}]+?)} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::; s/$$1$//;' echo {¤_starAligned}

If you know you will always remove _something then:

echo /data/nicotine_sensi/bam/9-3_box_1_S24_starAligned.sortedByCoord.out.bam |
  parallel --rpl '{¤} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::; s/_[^_]+$//;' echo {¤}

If you will be using this a lot then putting it in a profile will probably be a good idea.

Upvotes: 2

Related Questions