lakhujanivijay
lakhujanivijay

Reputation: 107

How to take substring from input file as an argument to a program to be executed in GNU-parallel?

I am trying to execute a program (say, biotool) using GNU-parallel which takes 3 arguments, i, o and a :

for example, say i have 10 text files like this

1_a_test.txt
2_b_test.txt
3_c_test.txt
...
10_j_test.txt

I want to run my tool (say biotool) on all the 10 text files. I tried this

parallel biotool -i {} -o {.}.out -a {} ::: *.txt

I want to pass the charachter/letter/whatever before the first underscore from the input text file name as an argument to -a option like this (dry run):

parallel biotool -i 1_a_test.txt -o 1_a_test.out -a 1 ::: *.txt`
parallel biotool -i 2_b_test.txt -o 2_b_test.out -a 2 ::: *.txt`
parallel biotool -i 3_c_test.txt -o 3_c_test.out -a 3 ::: *.txt`
...

{} supplies the complete file name to -a but I only want the sub string before the first underscore to be supplied to -a

Upvotes: 2

Views: 372

Answers (1)

Mark Setchell
Mark Setchell

Reputation: 207445

The easiest, but harder to read is this:

parallel --dry-run biotool -i {} -o {.}.out -a '{= s/_.*// =}'  ::: *test.txt

Alternatively, you can make a bash function that uses bash Parameter Substitution to extract the part before the underscore. Then export that to make it known to GNU Parallel

#!/bin/bash

doit(){
  i=$1
  o=$2
  # Use internal bash parameter substitution to extract whatever precedes "_" 
  # See https://www.tldp.org/LDP/abs/html/parameter-substitution.html
  a=${i/_*/}
  echo biotool -i "$i" -o "$o" -a "$a"
}

export -f doit

parallel doit {} {.}.out ::: *test.txt

Sample Output

biotool -i 10_j_test.txt -o 10_j_test.out -a 10
biotool -i 1_a_test.txt -o 1_a_test.out -a 1
biotool -i 2_b_test.txt -o 2_b_test.out -a 2

Upvotes: 2

Related Questions