Reputation: 13
I have a file (test.bed) that looks like this (which might not be tab-seperated):
chr1 10002 10116 id=1;frame=0;strand=+; 0 +
chr1 10116 10122 id=2;frame=0;strand=+; 0 +
chr1 10122 10128 id=3;frame=0;strand=+; 0 +
chr1 10128 10134 id=4;frame=0;strand=+; 0 +
chr1 10134 10140 id=5;frame=0;strand=+; 0 +
chr1 10140 10146 id=6;frame=0;strand=+; 0 +
chr1 10146 10182 id=7;frame=0;strand=+; 0 +
chr1 10182 10188 id=8;frame=0;strand=+; 0 +
chr1 10188 10194 id=9;frame=0;strand=+; 0 +
chr1 10194 10200 id=10;frame=0;strand=+; 0 +
I want to produce the following output (which should be tab-seperated):
chr1 10002 10116 id=1 0 +
chr1 10116 10122 id=2 0 +
chr1 10122 10128 id=3 0 +
chr1 10128 10134 id=4 0 +
chr1 10134 10140 id=5 0 +
chr1 10140 10146 id=6 0 +
chr1 10146 10182 id=7 0 +
chr1 10182 10188 id=8 0 +
chr1 10188 10194 id=9 0 +
chr1 10194 10200 id=10 0 +
I have tried with the following code:
awk 'OFS="\t" split ($0, a, ";"){print a[1],$5,$6}' test.bed
But then I get:
chr1 10002 10116 id=1 40 4+
chr1 10116 10122 id=2 40 4+
chr1 10122 10128 id=3 40 4+
chr1 10128 10134 id=4 40 4+
chr1 10134 10140 id=5 40 4+
chr1 10140 10146 id=6 40 4+
chr1 10146 10182 id=7 40 4+
chr1 10182 10188 id=8 40 4+
chr1 10188 10194 id=9 40 4+
chr1 10194 10200 id=10 40 4+
What am I doing wrong? Somehow the number '4' is added to the last two fields. I thought the number '4' somehow might have something to do with splitting in the 4th field, however, I tried producing a similar file where it was the 3rd field that was split, and still got the number '4' added to the last two fields. I am rather new to 'awk' so I guess it is an error in the syntax. Any help would be appreciated.
Upvotes: 1
Views: 279
Reputation: 85765
If you set your field separator as whitespace or semi-columns you won't have to handle the splitting yourself:
$ awk '{print $1,$2,$3,$4,$8,$9}' FS='[[:space:]]+|;' OFS='\t' file
chr1 10002 10116 id=1 0 +
chr1 10116 10122 id=2 0 +
chr1 10122 10128 id=3 0 +
chr1 10128 10134 id=4 0 +
chr1 10134 10140 id=5 0 +
chr1 10140 10146 id=6 0 +
chr1 10146 10182 id=7 0 +
chr1 10182 10188 id=8 0 +
chr1 10188 10194 id=9 0 +
chr1 10194 10200 id=10 0 +
As for what you are doing wrong in:
awk 'OFS="\t" split ($0, a, ";"){print a[1],$5,$6}'
awk
is condition{block}
and setting the value of OFS
and splitting is not a conditional. They are statements that should be inside the block. OFS
on every line so it should be initialized only once. You can do this using the -v
option, in the BEGIN
block or after the script. Valid alternatives:
$ awk -v OFS='\t' '{split($0,a,";");print a[1],$5,$6}' file
$ awk 'BEGIN{OFS="\t"}{split($0,a,";");print a[1],$5,$6}' file
$ awk '{split ($0,a,";");print a[1],$5,$6}' OFS='\t' file
Upvotes: 1