Reputation: 23
I have multiple files named:
Genus_species_strain.fasta
I want to use sed to print out:
Genus
species
strain
I want to use the "printed" words in a command like this (prokka is a tool for genome annotation):
prokka $file --outdir `echo $file | sed s/\.fasta//` --genus `echo $file | sed s/_.*\.fasta//` --species `echo $file | sed <something here>` --strain `echo $file | sed <something here>`
I would appreciate the help. I am very new to all of this, and as you see above, I only know how to print out Genus
.
Below I have some additional questions (no need to answer these if it only complicates things further). This is one of my attempts to print species
, and the questions are the following:
sed s/.*_//1 | sed s/_.*\.fasta//
I know the second command isn't correct. I assume it needs to start from the second _
, but I don't know how to do that, since the continuation (that is .fasta
) is unique.
When used alone, sed s/.*_//1
returns strain.fasta
. How to make it not skip the first _
?
Combining commands (either as you see above, or with ;
) doesn't seem to work for me.
Upvotes: 2
Views: 184
Reputation: 142
One liners without setting multiple varibles Using sed capture groups: One liner
file='Genus_species_strain.fasta'
$(echo "$file" | sed "s/\(^[^_]*\)_\([^_]*\)_\([^_]*\)\.\(.*\)/prokka "$(echo "$file")" --outdir \4 --genus \1 --species \2 --strain \3/")
Using Bash string manipulation: One liner
file='Genus_species_strain.fasta'
$(echo prokka "$file" --outdir `echo "${file#*.}"` --genus `echo "${file%%_*}"` --species "$(echo `file=${file#*_} && echo "${file%%_*}"`)" --strain "$(echo `file=${file#*_} && file=${file#*_} && echo "${file%%.*}"`)")
Awk one liner
file='Genus_species_strain.fasta'
$(echo "$file" | awk -F [_\.] -v var="$file" '{print "prokka " $var " --outdir " $4 " --genus " $1 " --species " $2 " --strain " $4}')
Now you can use above commands within loop or with xargs with file variable pointing to filenames. It will create a prokka command and directly evaluates/executes it.
Hoping it works for you. Accept answer if it is more efficient
Upvotes: 1
Reputation: 11227
Using sed
$ file=path_to_file
$ sed "s/\(\([^_]*\)_\([^_]*\)_\([^.]*\)\).*/prokka $file --outdir \1 --genus \2 --species \3 --strain \4/e" <(echo *.fasta)
Output of command executed
prokka path_to_file --outdir Genus_species_strain --genus Genus --species species --strain strain
Upvotes: 0
Reputation: 626825
You can use string splitting with string manipulation:
file='Genus_species_strain.fasta'
IFS='[_.]' read -r genus species strain _ <<< "$file"
outdir="${file%.*}"
Then you can use the variables in the command:
prokka "$file" --outdir "$outdir" --genus "$genus" --species "$species" --strain "$strain"
See this online demo:
#!/bin/bash
file='Genus_species_strain.fasta'
IFS='[_.]' read -r genus species strain _ <<< "$file"
echo "${file%.*}" # outdir
echo "$genus"
echo "$species"
echo "$strain"
Output:
Genus_species_strain
Genus
species
strain
Upvotes: 2