Reputation: 129
I'm trying to merge columns based on the value in the first field. I've tried using awk, but to no avail. Please see example input and output:
Input:
10013 97 1503384
10013 196 1506234
10013 61 1507385
10013 1559 1508385
10014 1726 1514507
10014 960 1519162
10015 1920 1545535
10015 124 1548915
10015 77 1550284
Desired_Output:
10013 97,196,61,1559 1503384,1506234,1507385,1508385
10014 1726,960 1514507,1519162
10015 1920,124,77 1545535,1548915,1550284
Thanks in advance for any advice!
Upvotes: 3
Views: 971
Reputation: 203229
$ cat tst.awk
$1 != f1 { if (NR>1) print f1, f2, f3; f1=f2=f3=s="" }
{ f1=$1; f2=f2 s $2; f3=f3 s $3; s="," }
END { print f1, f2, f3 }
$ awk -f tst.awk file | column -t
10013 97,196,61,1559 1503384,1506234,1507385,1508385
10014 1726,960 1514507,1519162
10015 1920,124,77 1545535,1548915,1550284
Upvotes: 2
Reputation: 92854
Awk
solution (assuming that the input lines are already sorted):
awk '!a[$1]++{ if ("f2" in b) { print f1, b["f2"], b["f3"]; delete b } }
{
f1=$1;
b["f2"]=(b["f2"]!=""? b["f2"]",":"")$2;
b["f3"]=(b["f3"]!=""? b["f3"]",":"")$3
}
END{ print f1, b["f2"], b["f3"] }' OFS='\t file
delete b
- with this action we'll prevent the array b
from holding all values during the processing (saving memory). It will be cleared on each unique 1st field valueThe output:
10013 97,196,61,1559 1503384,1506234,1507385,1508385
10014 1726,960 1514507,1519162
10015 1920,124,77 1545535,1548915,1550284
Upvotes: 0
Reputation: 67467
awk
to the rescue!
$ awk '{f2[$1]=f2[$1] sep[$1] $2; # concatenate 2nd field
f3[$1]=f3[$1] sep[$1] $3; # concatenate 3rd field
sep[$1]=","} # lazy init separator to skip first
END {for(k in f2) print k,f2[k],f3[k]}' file | # iterate over keys and print
column -t # pretty print
10013 97,196,61,1559 1503384,1506234,1507385,1508385
10014 1726,960 1514507,1519162
10015 1920,124,77 1545535,1548915,1550284
note the output order is not guaranteed, but you can sort by the first field.
Upvotes: 1
Reputation: 92854
The shortest GNU datamash
solution:
datamash -sW -g1 collapse 2 collapse 3 <file
-g1
- group by the 1st fieldcollapse N
- operation producing comma-separated list of all input values of the field N
within each groupThe output:
10013 97,196,61,1559 1503384,1506234,1507385,1508385
10014 1726,960 1514507,1519162
10015 1920,124,77 1545535,1548915,1550284
Upvotes: 5