Vipin Choudhary
Vipin Choudhary

Reputation: 341

awk | merge line on the basis of field matching

I need help with following:

Input file:

abc message=sent session:111,x,y,z
pqr message=receive session:111,4,5,7
abc message=sent session:123,x,y,z
pqr message=receive session:123,4,5,7
abc message=sent session:342,x,y,z
abc message=sent session:589,x,y,z
pqr message=receive session:589,4,5,7

Output file:

abc message=sent session:111,x,y,z, pqr message=receive session:111,4,5,7
abc message=sent session:123,x,y,z, pqr message=receive session:123,4,5,7
abc message=sent session:342,x,y,z, NOMATCH
abc message=sent session:589,x,y,z, pqr message=receive session:589,4,5,7

Notes:

If you see in source file, for every "sent" message there is "receive"
only for session=342 there is no receive
session is unknow, can't be hardcoded
So merge only those sent and receive where we have matching session number

Upvotes: 3

Views: 608

Answers (2)

Guru
Guru

Reputation: 16974

Another way:

awk -F "[:,]"  '/=sent/{a[$2]=$0;}/=receive/{print a[$2], $0;delete a[$2];}END{for(i in a)print a[i],"NO MATCH";}' file

Results:

abc message=sent session:111,x,y,z pqr message=receive session:111,4,5,7
abc message=sent session:123,x,y,z pqr message=receive session:123,4,5,7
abc message=sent session:589,x,y,z pqr message=receive session:589,4,5,7
abc message=sent session:342,x,y,z NO MATCH

When the send record is encountered, it is store in the array with the session id as the index. When the receive record is encountered, the send record is fetched from the array and printed along with receive record. Also, sent records are removed from array as and when receive records are received. At the END, all the remaining records in the array are printed as NO MATCH.

Upvotes: 1

Steve
Steve

Reputation: 54392

Here's one way using awk. Run like:

awk -f script.awk file

Contents of script.awk:

{
    x = $0

    gsub(/[^:]*:|,.*/,"")

    a[$0] = (a[$0] ? a[$0] "," FS : "") x
    b[$0]++
}

END {
    for (i in a) {
        print (b[i] == 2 ? a[i] : a[i] "," FS "NOMATCH") | "sort"
    }
}

Results:

abc message=sent session:111,x,y,z, pqr message=receive session:111,4,5,7
abc message=sent session:123,x,y,z, pqr message=receive session:123,4,5,7
abc message=sent session:342,x,y,z, NOMATCH
abc message=sent session:589,x,y,z, pqr message=receive session:589,4,5,7

Alternatively, here's the one-liner:

awk '{ x = $0; gsub(/[^:]*:|,.*/,""); a[$0] = (a[$0] ? a[$0] "," FS : "") x; b[$0]++ } END { for (i in a) print (b[i] == 2 ? a[i] : a[i] "," FS "NOMATCH") | "sort" }' file

Note that you can drop the pipe to sort if you don't care about sorted output. HTH.

Upvotes: 1

Related Questions