Merge rows by duplicate columns in awk/unix commands

Question

I have a file with tab-delim columns as shown below:

islet   Can c37 P39036.1
islet   Hum c37 P240652.3
islet   Hum c37 P437357.1
islet   Mou c37 P43956.2
prot    Can c46 P14676.3

For each unique string in $3, i would need to merge $1,$2 and $4 so that the output looks like below:

islet   Can,Hum,Mou    c37    P39036.1,P240652.3,P437357.1,P43956.2
prot    Can    c46    P14676.3

I presume that this is a common task which could have an existing solution. COuld someone offer a solution for this in awk/unix commands.

RomanPerekhrest · Accepted Answer

Awk solution:

(the file is already sorted by the 3rd field)

awk 'k && $3 != k{ print f1, f2[k], k, f4[k]; delete f2; delete f4 }
     { f1=$1; k=$3; f2[k]=(f2[k]? f2[k]"," : "")$2; f4[k]=(f4[k]? f4[k]"," : "")$4 }
     END{ print f1, f2[k], k, f4[k] }' file

The output:

islet Can,Hum,Hum,Mou c37 P39036.1,P240652.3,P437357.1,P43956.2
prot Can c46 P14676.3

Merge rows by duplicate columns in awk/unix commands

Answers (1)

Related Questions