Perlnika
Perlnika

Reputation: 5056

regexp find and replace: bash variables inside sed

I would like to remove this sequence when present at the beginning of the line:

ATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG followed by at least 3 A characters.

Both, sequence and multiple A should be removed and the rest of the file should be preserved.

My input files look like this:

@M00946:3:000000000-A2WF2:1:1101:18115:1962 1:N:0:2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACATTTTCTTTCTTACTTCGTTCACTTTCCACTTCTTTCTCCCTATCTTCCCCCTTCTGTCTGCCCCAGCTGTCTATCCCACTTATTGTCTCCCCCCACTGCCCCACACTCCTACCTTCTTCATCTTCACCTAACACCTCCCGCTCCCTCCTTATCGTCTCTTATCCTTTCCTTGTTCC
+
????????DDDDDDDDGGGGGGHHIIIIHHHIIIIFHIIIH/CGFHHIIIIHEDHHIIIIHI=5EEGFEHHEC+5,,4@,@,,....--..+77,,.6..6.....7.4..7.76=..-5.>.4-)134-.5....-3*))0***1*********10*0**01*1*)''..0***.)0'))*****00*11******01***0****0*)**0)'''...*0)0*11********1****1*0********
@M00946:3:000000000-A2WF2:1:1101:19888:2900 1:N:0:2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACACAAATACCGTTCCAATATCTTTTTGTTTCATGTCTAATAAC
+
<<??????BB?BBBBBCAFFFCFHF;>EFCDFGFFHFBGHCA=FHA>EFGEE7CF>F?FFHB=?EEGF>>DH5<)++,++,4,,4+=:,,,,5,,,,,,,,),33?,3,3,3,,,,33

I was trying to use script replace.sh which looks like this

file=$1;
adapter_sequence=$2;
sed -r "s/${adapter_sequence}A{3}//" $file

from the command line:

./replace.sh file.fastq GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG

It did not work. Any help in any script language will be appreciated.

Upvotes: 1

Views: 193

Answers (1)

anubhava
anubhava

Reputation: 784948

I believe your have $1, $2 reversed. Have it like this:

adapter_sequence=$2
sed "s/$adapter_sequence//" $1

In the ideal case I would like to remove all adapter sequences starting at the beginning of line followed by at least three A letters,

Try this sed:

sed -r "s/^${adapter_sequence}A{3,}//" file

Upvotes: 2

Related Questions