justaguy
justaguy

Reputation: 3022

awk to parse input and remove text in field

Trying to parse a file using awk but not getting the desired output and I can not seem to figure it out. Thank you :).

input.txt

chr1    955543  955763  AGRN-6|pr=2|gc=75   0   +
chr1    957571  957852  AGRN-7|pr=3|gc=61.2 0   +
chr1    970621  970740  AGRN-8|pr=1|gc=57.1 0   +

current output.txt

chr1    955543  955763  AGRN-6 pr=2 gc=75   0   +

chr1    957571  957852  AGRN-7 pr=3 gc=61.2 0   +

chr1    970621  970740  AGRN-8 pr=1 gc=57.1 0   +

desired output.txt (|pr=2|gc=75 0, and space between lines) removed from output

chr1    955543  955763  AGRN-6  +
chr1    957571  957852  AGRN-7  +
chr1    970621  970740  AGRN-8  +

Here is what I have tried:

awk -F"[*|]" '{print $1, $2, $3, $4, $5, $6,}' input.txt > output.txt

Upvotes: 0

Views: 80

Answers (3)

karakfa
karakfa

Reputation: 67467

another alternative (if you don't care of the output spacing)

$ awk '{split($4,a,"|"); print $1,$2,$3,a[1],$NF}' file
chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +

Upvotes: 1

F. Knorr
F. Knorr

Reputation: 3055

Probably the easiest solution:

awk -F"|" '{print $1"   +"}' input.txt > output.txt

In this solution, however, the trailing "+" is added manually. Output:

chr1    955543  955763  AGRN-6   +
chr1    957571  957852  AGRN-7   +
chr1    970621  970740  AGRN-8   +

Otherwise, try

 awk -F"[| ]+" '{print $1, $2, $3, $4, $8}' input.txt > output.txt

which outputs

chr1 955543 955763 AGRN-6 +
chr1 957571 957852 AGRN-7 +
chr1 970621 970740 AGRN-8 +

Upvotes: 1

glenn jackman
glenn jackman

Reputation: 246754

You could do this:

awk -F '[[:blank:]]+|\\|' '{print $1, $2, $3, $4, $NF}'

That gives you the fields you want, but it does not keep the spacing. This will:

awk '{sub(/\|[^[:blank:]]+[[:blank:]]+[0-9]+/, ""); print }' <<END
chr1    955543  955763  AGRN-6|pr=2|gc=75   0   +
chr1    957571  957852  AGRN-7|pr=3|gc=61.2 0   +
chr1    970621  970740  AGRN-8|pr=1|gc=57.1 0   +
END
chr1    955543  955763  AGRN-6   +
chr1    957571  957852  AGRN-7   +
chr1    970621  970740  AGRN-8   +

Upvotes: 3

Related Questions