Reputation: 1645
I have a file with tab-delim columns as shown below:
islet Can c37 P39036.1
islet Hum c37 P240652.3
islet Hum c37 P437357.1
islet Mou c37 P43956.2
prot Can c46 P14676.3
For each unique string in $3, i would need to merge $1,$2 and $4 so that the output looks like below:
islet Can,Hum,Mou c37 P39036.1,P240652.3,P437357.1,P43956.2
prot Can c46 P14676.3
I presume that this is a common task which could have an existing solution. COuld someone offer a solution for this in awk/unix commands.
Upvotes: 1
Views: 234
Reputation: 92854
Awk
solution:
(the file is already sorted by the 3rd field)
awk 'k && $3 != k{ print f1, f2[k], k, f4[k]; delete f2; delete f4 }
{ f1=$1; k=$3; f2[k]=(f2[k]? f2[k]"," : "")$2; f4[k]=(f4[k]? f4[k]"," : "")$4 }
END{ print f1, f2[k], k, f4[k] }' file
The output:
islet Can,Hum,Hum,Mou c37 P39036.1,P240652.3,P437357.1,P43956.2
prot Can c46 P14676.3
Upvotes: 1