Reputation: 3022
Trying to parse a file using awk
but not getting the desired output and I can not seem to figure it out. Thank you :).
input.txt
chr1 955543 955763 AGRN-6|pr=2|gc=75 0 +
chr1 957571 957852 AGRN-7|pr=3|gc=61.2 0 +
chr1 970621 970740 AGRN-8|pr=1|gc=57.1 0 +
current output.txt
chr1 955543 955763 AGRN-6 pr=2 gc=75 0 +
chr1 957571 957852 AGRN-7 pr=3 gc=61.2 0 +
chr1 970621 970740 AGRN-8 pr=1 gc=57.1 0 +
desired output.txt (|pr=2|gc=75 0
, and space between lines) removed from output
chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +
Here is what I have tried:
awk -F"[*|]" '{print $1, $2, $3, $4, $5, $6,}' input.txt > output.txt
Upvotes: 0
Views: 80
Reputation: 67467
another alternative (if you don't care of the output spacing)
$ awk '{split($4,a,"|"); print $1,$2,$3,a[1],$NF}' file
chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +
Upvotes: 1
Reputation: 3055
Probably the easiest solution:
awk -F"|" '{print $1" +"}' input.txt > output.txt
In this solution, however, the trailing "+" is added manually. Output:
chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +
Otherwise, try
awk -F"[| ]+" '{print $1, $2, $3, $4, $8}' input.txt > output.txt
which outputs
chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +
Upvotes: 1
Reputation: 246754
You could do this:
awk -F '[[:blank:]]+|\\|' '{print $1, $2, $3, $4, $NF}'
That gives you the fields you want, but it does not keep the spacing. This will:
awk '{sub(/\|[^[:blank:]]+[[:blank:]]+[0-9]+/, ""); print }' <<END
chr1 955543 955763 AGRN-6|pr=2|gc=75 0 +
chr1 957571 957852 AGRN-7|pr=3|gc=61.2 0 +
chr1 970621 970740 AGRN-8|pr=1|gc=57.1 0 +
END
chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +
Upvotes: 3