Alexandros
Alexandros

Reputation: 2200

Grouping elements by two fields on a space delimited file

I have this ordered data by column 2 then 3 and then 1 in a space delimited file (i used linux sort to do that):

0 0 2
1 0 2
2 0 2
1 1 4
2 1 4

I want to create a new file (leaving the old file as is)

0 2 0,1,2
1 4 1,2

Basically put the fields 2 and 3 first and group the elements of field 1 (as a comma separated list) by them. Is there a way to do that by an awk, sed, bash one liner, so to avoid writing a Java, C++ app for that?

Upvotes: 0

Views: 194

Answers (6)

potong
potong

Reputation: 58473

This might work for you (GNU sed):

sed -r ':a;$!N;/(. (. .).*)\n(.) \2.*/s//\1,\3/;ta;s/(.) (.) (.)/\2 \3 \1/;P;D' file

This appends the first column of the subsequent record to the first record until the second and third keys change. Then the fields in the first record are re-arranged and printed out.

This uses the data presented but can be adapted for more complex data.

Upvotes: 0

glenn jackman
glenn jackman

Reputation: 247012

Yet another take:

awk -v SUBSEP=" " '
    {group[$2,$3] = group[$2,$3] $1 ","} 
    END {
        for (g in group) {
            sub(/,$/,"",group[g])
            print g, group[g]
        }
    }
' file > newfile

The SUBSEP variable is the character that joins strings in a single-dimensional awk array.
http://www.gnu.org/software/gawk/manual/html_node/Multidimensional.html#Multidimensional

Upvotes: 0

anubhava
anubhava

Reputation: 785481

Using awk:

awk '{k=$2 OFS $3} !(k in a){a[k]=$1; b[++n]=k; next} {a[k]=a[k] "," $1}
     END{for (i=1; i<=n; i++) print b[i],a[b[i]]}' file
0 2 0,1,2
1 4 1,2

Upvotes: 1

konsolebox
konsolebox

Reputation: 75548

awk 'a[$2, $3]++ { p = p "," $1; next } p { print p } { p = $2 FS $3 FS $1 } END { if (p) print p }' file

Output:

0 2 0,1,2
1 4 1,2
  • The solution assumes data on second and third column is sorted.

Upvotes: 1

Kent
Kent

Reputation: 195169

with your input and output this line may help:

 awk '{f=$2 FS $3}!(f in a){i[++p]=f;a[f]=$1;next}
      {a[f]=a[f]","$1}END{for(x=1;x<=p;x++)print i[x],a[i[x]]}' file

test:

kent$  cat f
0 0 2
1 0 2
2 0 2
1 1 4
2 1 4

kent$  awk '{f=$2 FS $3}!(f in a){i[++p]=f;a[f]=$1;next}{a[f]=a[f]","$1}END{for(x=1;x<=p;x++)print i[x],a[i[x]]}' f
0 2 0,1,2
1 4 1,2

Upvotes: 1

jaypal singh
jaypal singh

Reputation: 77135

Since the file is already ordered, you can print the line as they change:

awk '
  seen==$2 FS $3 { line=line "," $1; next }
  { if(seen) print seen, line; seen=$2 FS $3; line=$1 }
  END { print seen, line }
' file
0 2 0,1,2
1 4 1,2

This will preserve the order of output.

Upvotes: 2

Related Questions