Gunther
Gunther

Reputation: 129

merge rows based on field value linux

I'm trying to merge columns based on the value in the first field. I've tried using awk, but to no avail. Please see example input and output:

Input:  
10013   97      1503384  
10013   196     1506234  
10013   61      1507385  
10013   1559    1508385  
10014   1726    1514507  
10014   960     1519162  
10015   1920    1545535  
10015   124     1548915  
10015   77      1550284  

Desired_Output:  
10013   97,196,61,1559  1503384,1506234,1507385,1508385  
10014   1726,960        1514507,1519162  
10015   1920,124,77     1545535,1548915,1550284  

Thanks in advance for any advice!

Upvotes: 3

Views: 971

Answers (4)

Ed Morton
Ed Morton

Reputation: 203229

$ cat tst.awk
$1 != f1 { if (NR>1) print f1, f2, f3; f1=f2=f3=s="" }
{ f1=$1; f2=f2 s $2; f3=f3 s $3; s="," }
END { print f1, f2, f3 }

$ awk -f tst.awk file | column -t
10013  97,196,61,1559  1503384,1506234,1507385,1508385
10014  1726,960        1514507,1519162
10015  1920,124,77     1545535,1548915,1550284

Upvotes: 2

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

Awk solution (assuming that the input lines are already sorted):

awk '!a[$1]++{ if ("f2" in b) { print f1, b["f2"], b["f3"]; delete b } }
     { 
         f1=$1; 
         b["f2"]=(b["f2"]!=""? b["f2"]",":"")$2; 
         b["f3"]=(b["f3"]!=""? b["f3"]",":"")$3 
     }
     END{ print f1, b["f2"], b["f3"] }' OFS='\t file
  • delete b - with this action we'll prevent the array b from holding all values during the processing (saving memory). It will be cleared on each unique 1st field value

The output:

10013   97,196,61,1559  1503384,1506234,1507385,1508385
10014   1726,960    1514507,1519162
10015   1920,124,77 1545535,1548915,1550284

Upvotes: 0

karakfa
karakfa

Reputation: 67467

awk to the rescue!

$ awk '{f2[$1]=f2[$1] sep[$1] $2;                   # concatenate 2nd field 
        f3[$1]=f3[$1] sep[$1] $3;                   # concatenate 3rd field 
        sep[$1]=","}                                # lazy init separator to skip first
   END {for(k in f2) print k,f2[k],f3[k]}' file |   # iterate over keys and print
  column -t                                         # pretty print


10013  97,196,61,1559  1503384,1506234,1507385,1508385
10014  1726,960        1514507,1519162
10015  1920,124,77     1545535,1548915,1550284

note the output order is not guaranteed, but you can sort by the first field.

Upvotes: 1

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

The shortest GNU datamash solution:

datamash -sW -g1 collapse 2 collapse 3 <file
  • -g1 - group by the 1st field
  • collapse N - operation producing comma-separated list of all input values of the field N within each group

The output:

10013   97,196,61,1559  1503384,1506234,1507385,1508385
10014   1726,960    1514507,1519162
10015   1920,124,77 1545535,1548915,1550284

Upvotes: 5

Related Questions