Reputation: 343
I need to split a file with multiple columns that looks like this:
TCONS_00000001 q1:Ovary1.13|Ovary1.13.1|100|32.599877 q2:Ovary2.16|Ovary2.16.1|100|88.36
TCONS_00000002 q1:Ovary1.19|Ovary1.19.1|100|12.876644 q2:Ovary2.15|Ovary2.15.1|100|365.44
TCONS_00000003 q1:Ovary1.19|Ovary1.19.2|0|0.000000 q2:Ovary2.19|Ovary2.19.1|100|64.567
Output needed:
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
My attempt:
awk 'BEGIN {OFS=FS="\t"}{split($2,two,"|");split($3,thr,"|");print $1,two[2],two[4],thr[2],thr[4]}' in.file
Problem:
I have many more columns to split like 2 and 3, I would like to find a shorter solutions than splitting every column one by one.
Upvotes: 1
Views: 1104
Reputation: 23667
$ # borrowing simplicity from @Inian's answer ;)
$ awk 'BEGIN{FS=OFS="\t"}
{for(i=2; i<=NF; i++){split($i,a,/[:|]/); $i=a[3] "\t" a[5]}} 1' ip.txt
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
$ # previous solution which leaves tab character at end
$ awk -F'\t' '{printf "%s\t",$1;
for(i=2; i<=NF; i++){split($i,a,/[:|]/); printf "%s\t%s\t",a[3],a[5]};
print ""}' ip.txt
TCONS_00000001 Ovary1.13.1 32.599877 Ovary2.16.1 88.36
TCONS_00000002 Ovary1.19.1 12.876644 Ovary2.15.1 365.44
TCONS_00000003 Ovary1.19.2 0.000000 Ovary2.19.1 64.567
Upvotes: 2
Reputation: 85530
While Sundeep's answer is great, if you are planning for a redundant action on a set of records, suggest using a function and run it on each record.
I would write an awk
script as below
#!/usr/bin/env awk
function split_args(record) {
n=split(record,split_array,"[:|]")
return (split_array[3]"\t"split_array[n])
}
BEGIN { FS=OFS="\t" }
{
for (i=2;i<=NF;i++) {
$i=split_args($i)
}
print
}
and invoke it as
awk -f script.awk inputfile
An ugly command-line version of it would be
awk 'function split_args(record) {
n=split(record,split_array,"[:|]")
return (split_array[3]"\t"split_array[n])
}
BEGIN { FS=OFS="\t" }
{
for (i=2;i<=NF;i++) {
$i=split_args($i)
}
print
}
' newfile
Upvotes: 2