dmattbailey
dmattbailey

Reputation: 21

awk - merge lines on the basis of multiple field matching

I am new to shell commands and am trying to take the following:

05 ; 47.89 ; 971  
05 ; 49.23 ; 971
05 ; 1.44 ; 971
10 ; 61.42 ; 508
10 ; 37.40 ; 508
10 ; 0.39 ; 508
15 ; 45.09 ; 173
15 ; 52.60 ; 173
15 ; 0.00 ; 173

And output:

05 ; 971 ; 47.89 ; 49.23 ; 1.44
10 ; 508 ; 61.42 ; 37.40 ; 0.39
15 ; 173 ; 45.09 ; 52.60 ; 0.00

So far I tried:

awk -F';' '{ x[$1]=x[$1] " " $2; y[$2]=y[$2] " " $1; } END { for (k in x) print k,x[k];}' graphtemp.txt

which I know doesn't print column 3 as needed. I appreciate any help.

Upvotes: 2

Views: 2203

Answers (3)

Zsolt Botykai
Zsolt Botykai

Reputation: 51613

awk -F' *; *' '{a[$1 ";" $3]=a[$1 ";" $3] ";" $2} END {for (k in a) {print k a[k]}}' INPUTFILE

This might work for you/give you a starting point, if you really has the exact input file format.

See here in action: http://ideone.com/r44wjm

Note: this will give you an unsorted output (you can sort an array with (g)awk, but I'm leaving that to you.

How it works:

  1. creates an array element for every $1 ";" $3 (key), and appends the 2nd column to its value.
  2. END loops over the keys of the array, print the key, and value.

Upvotes: 0

anubhava
anubhava

Reputation: 785156

Try this awk command:

awk 'BEGIN{FS=OFS=";"} {c=$1 FS $3; if (c in a) a[c]=a[c] FS $2; else a[c]=$2}
     END{for (k in a) print k, a[k]}' file

EDIT: Looks like you've \r\n at end of each line, try this modified code instead:

awk -F '[ ;\r]+' 'BEGIN{OFS=";"} {c=$1 OFS $3;
  if (c in a) a[c]=a[c] OFS $2; else a[c]=$2} END{for (k in a) print k, a[k]}' 

Upvotes: 2

(I overlooked the requirement to use semicolons as the output field separator. Because I'm using concatenated keys below, I found it easier to manage semicolons inline rather than declaring OFS in a BEGIN block.)

Build a concatenated key for an array. Concatenate the second column to the existing values in the array. Print each key and value to stdout.

$ cat test.awk
{
   x = sprintf("%s ; %s", $1, $3)
   if (x in a ) 
      a[x] = sprintf("%s ; %s", a[x], $2) 
  else
      a[x] = "; " $2
}
END {
    for (k in a) print k, a[k]
}

Sort for convenience. It doesn't matter how many rows have the same values for the first and third columns in the input file.

$ awk -F";" -f test.awk test.dat | sort
05  ;  971 ;  47.89  ;  49.23  ;  1.44 
10  ;  508 ;  61.42  ;  37.40  ;  0.39 
15  ;  173 ;  45.09  ;  52.60  ;  0.00 

Upvotes: 0

Related Questions