Andrea Spinelli
Andrea Spinelli

Reputation: 61

sed & regex expression

I'm trying to add a 'chr' string in the lines where is not there. This operation is necessary only in the lines that have not '##'. At first I use grep + sed commands, as following, but I want to run the command overwriting the original file.

grep -v "^#" 5b110660bf55f80059c0ef52.vcf | grep -v 'chr' | sed 's/^/chr/g'

So, to run the command in file I write this:

sed -i -E '/^#.*$|^chr.*$/ s/^/chr/' 5b110660bf55f80059c0ef52.vcf

This is the content of the vcf file.

##FORMAT=<ID=DP4,Number=4,Type=Integer,Description="#ref plus strand,#ref minus strand, #alt plus strand, #alt minus strand">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  24430-0009S21_GM17-12140
1   955597  95692   G   T   1382    PASS    VARTYPE=1;BGN=0.00134309;ARL=150;DER=53;DEA=55;QR=40;QA=39;PBP=1091;PBM=300;TYPE=SNP;DBXREF=dbSNP:rs115173026,g1000:0.2825,esp5400:0.2755,ExAC:0.2290,clinvar:rs115173026,CLNSIG:2,CLNREVSTAT:mult,CLNSIGLAB:Benign;SGVEP=AGRN|+|NM_198576|1|c.45G>T|p.:(p.Pro15Pro)|synonymous GT:DP:AD:DP4    0/1:125:64,61:50,14,48,13
chr1    957898  82729935    G   T   1214    off_target  VARTYPE=1;BGN=0.00113362;ARL=149;DER=50;DEA=55;QR=38;QA=40;PBP=245;PBM=978;NVF=0.53;TYPE=SNP;DBXREF=dbSNP:rs2799064,g1000:0.3285;SGVEP=AGRN|+|NM_198576|2|c.463+56G>T|.|intronic    GT:DP:AD:DP4    0/1:98:47,51:9,38,10,41

Upvotes: 0

Views: 206

Answers (3)

bipll
bipll

Reputation: 11950

This can be done with a single sed invocation. The script itself is something like the following.

If you have an input of format

$ echo -e '#\n#\n123chr456\n789chr123\nabc'
#
#
123chr456
789chr123
abc

then to prepend chr to non-commented chrless lines is done as

$ echo -e '#\n#\n123chr456\n789chr123\nabc' | sed '/^#/ {p
d
}
/chr/ {p
d
}
s/^/chr/'

which prints

#
#
123chr456
789chr123
chrabc

(Note the multiline sed script.)

Now you only need to run this script on a file in-place (-i in modern sed versions.)

Upvotes: 0

Hazzard17
Hazzard17

Reputation: 723

If I understand what is your expected result, try:

sed -ri '/^(#|chr)/! s/^/chr/' file

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 204558

Your question isn't clear and you didn't provide the expected output so we can't test a potential solution but if all you want is to add chr to the start of lines where it's not already present and which don't start with # then that's just:

awk '!/^(#|chr)/{$0="chr" $0} 1' file

To overwrite the original file using GNU awk would be:

awk -i inplace '!/^(#|chr)/{$0="chr" $0} 1' file

and with any awk:

awk '!/^(#|chr)/{$0="chr" $0} 1' file > tmp && mv tmp file

Upvotes: 0

Related Questions