Bash script - Construct a single line out of many lines having duplicates in a single column

Question

I have an instrumented log file that have 6 lines of duplicated first column as below.

//SC001@1/1/1@1/1,get,ClientStart,1363178707755
//SC001@1/1/1@1/1,get,TalkToSocketStart,1363178707760
//SC001@1/1/1@1/1,get,DecodeRequest,1363178707765
//SC001@1/1/1@1/1,get-reply,EncodeReponse,1363178707767
//SC001@1/1/1@1/2,get,DecodeRequest,1363178708765
//SC001@1/1/1@1/2,get-reply,EncodeReponse,1363178708767
//SC001@1/1/1@1/2,get,TalkToSocketEnd,1363178708770
//SC001@1/1/1@1/2,get,ClientEnd,1363178708775
//SC001@1/1/1@1/1,get,TalkToSocketEnd,1363178707770
//SC001@1/1/1@1/1,get,ClientEnd,1363178707775
//SC001@1/1/1@1/2,get,ClientStart,1363178708755
//SC001@1/1/1@1/2,get,TalkToSocketStart,1363178708760

Note: , (comma) is the delimiter here

Like wise there are many duplicate first column values (IDs) in the log file (above example having only two values (IDs); //SC001@1/1/1@1/1 and //SC001@1/1/1@1/2) I need to consolidate log records as below format.

ID,ClientStart,TalkToSocketStart,DecodeRequest,EncodeReponse,TalkToSocketEnd,ClientEnd

//SC001@1/1/1@1/1,1363178707755,1363178707760,1363178707765,1363178707767,1363178707770,1363178707775
//SC001@1/1/1@1/2,1363178708755,1363178708760,1363178708765,1363178708767,1363178708770,1363178708775

I suppose to have a bash script for this exercise and appreciate an expert support for this. Hope there may be a sed or awk solution which is more efficient.

Thanks much

Guru · Accepted Answer

One way:

sort -t, -k4n,4 file | awk -F, '{a[$1]=a[$1]?a[$1] FS $NF:$NF;}END{for(i in a){print i","a[i];}}'

sort command sorts the file on the basis of the last(4th) column. awk takes the sorted input and forms an array where the 1st field is the key, and the value is combination of values of the last column.

Bash script - Construct a single line out of many lines having duplicates in a single column

Answers (1)

Related Questions