Reputation: 397
System: Linux. Bash 4.
I have the following file, which will be read into a script as a variable:
/path/sample_A.bam A 1
/path/sample_B.bam B 1
/path/sample_C1.bam C 1
/path/sample_C2.bam C 2
I want to append "_string" at the end of the filename of the first column, but before the extension (.bam). It's a bit trickier because of containing the path at the beginning of the name.
Desired output:
/path/sample_A_string.bam A 1
/path/sample_B_string.bam B 1
/path/sample_C1_string.bam C 1
/path/sample_C2_string.bam C 2
My attempt: I did the following script (I ran: bash script.sh):
List=${1};
awk -F'\t' -vOFS='\t' '{ $1 = "${1%.bam}" "_string.bam" }1' < ${List} ;
And its output was:
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
${1%.bam}_string.bam
Problem: I followed the idea of using awk for this substitution as in this thread https://unix.stackexchange.com/questions/148114/how-to-add-words-to-an-existing-column , but the parameter expansion of ${1%.bam} it's clearly not being recognised by AWK as I intend. Does someone know the correct syntax for that part of code? That part was meant to mean "all the first entry of the first column, except the last part of .bam". I used ${1%.bam} because it works in Bash, but AWK it's another language and probably this differs. Thank you!
Upvotes: 1
Views: 1398
Reputation: 2471
You can try this way with awk :
awk -v a='_string' 'BEGIN{FS=OFS="."}{$1=$1 a}1' infile
Upvotes: 1
Reputation: 1695
sed -i 's/\.bam/_string\.bam/g' myfile.txt
It's a single line with sed. Just replace the .bam with _string.bam
Upvotes: 2
Reputation: 133458
If I understood your requirement correctly, could you please try following.
val="_string"
awk -v value="$val" '{sub(".bam",value"&")} 1' Input_file
Brief explanation: -v value
means passing shell variable named val
value to awk variable variable
here. Then using sub
function of awk
to substitute string .bam
with string value along with .bam
value which is denoted by &
too. Then mentioning 1
means print edited/non-edtied line.
Why OP's attempt didn't work: Dear, OP. in awk
we can't pass variables of shell directly without mentioning them in awk
language. So what you are trying will NOT take it as an awk
variable rather than it will take it as a string and printing it as it is. I have mentioned in my explanation above how to define shell variables in awk
too.
NOTE: In case you have multiple occurences of .bam
then please change sub
to gsub
in above code. Also in case your Input_file is TAB delmited then use awk -F'\t'
in above code.
Upvotes: 2
Reputation: 85560
Note that the paramter expansion you applied on $1
won't apply inside awk
as the entire command
body of the awk
command is passed in '..'
which sends content literally without applying any
shell parsing. Hence the string "${1%.bam}"
is passed as-is to the first column.
You can do this completely in Awk
awk -F'\t' 'BEGIN { OFS = FS }{ n=split($1, arr, "."); $1 = arr[1]"_string."arr[2] }1' file
The code basically splits the content of $1
with delimiter .
into an array arr
in the context of Awk
. So the part of the string upto the first .
is stored in arr[1]
and the subsequent split fields are stored in the next array indices. We re-construct the filename of your choice by concatenating the array entries with the _string
in the filename part without extension.
Upvotes: 3