Reputation: 5056
I would like to remove this sequence when present at the beginning of the line:
ATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG followed by at least 3 A characters.
Both, sequence and multiple A should be removed and the rest of the file should be preserved.
My input files look like this:
@M00946:3:000000000-A2WF2:1:1101:18115:1962 1:N:0:2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAACATTTTCTTTCTTACTTCGTTCACTTTCCACTTCTTTCTCCCTATCTTCCCCCTTCTGTCTGCCCCAGCTGTCTATCCCACTTATTGTCTCCCCCCACTGCCCCACACTCCTACCTTCTTCATCTTCACCTAACACCTCCCGCTCCCTCCTTATCGTCTCTTATCCTTTCCTTGTTCC
+
????????DDDDDDDDGGGGGGHHIIIIHHHIIIIFHIIIH/CGFHHIIIIHEDHHIIIIHI=5EEGFEHHEC+5,,4@,@,,....--..+77,,.6..6.....7.4..7.76=..-5.>.4-)134-.5....-3*))0***1*********10*0**01*1*)''..0***.)0'))*****00*11******01***0****0*)**0)'''...*0)0*11********1****1*0********
@M00946:3:000000000-A2WF2:1:1101:19888:2900 1:N:0:2
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTGAAAAAAAAAAAACACAAATACCGTTCCAATATCTTTTTGTTTCATGTCTAATAAC
+
<<??????BB?BBBBBCAFFFCFHF;>EFCDFGFFHFBGHCA=FHA>EFGEE7CF>F?FFHB=?EEGF>>DH5<)++,++,4,,4+=:,,,,5,,,,,,,,),33?,3,3,3,,,,33
I was trying to use script replace.sh which looks like this
file=$1;
adapter_sequence=$2;
sed -r "s/${adapter_sequence}A{3}//" $file
from the command line:
./replace.sh file.fastq GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
It did not work. Any help in any script language will be appreciated.
Upvotes: 1
Views: 193
Reputation: 784948
I believe your have $1
, $2
reversed. Have it like this:
adapter_sequence=$2
sed "s/$adapter_sequence//" $1
In the ideal case I would like to remove all adapter sequences starting at the beginning of line followed by at least three A letters,
Try this sed:
sed -r "s/^${adapter_sequence}A{3,}//" file
Upvotes: 2