awk - collecting the similar lines in a single line

Question

I have a file:

input.txt:

a_1_bcd
ab_1_e
i_2_gxyz
la_3_df
de_3_fg
ff_3_hi

I treat a part between the first and second underscores as ID and I want to put all the lines sharing the same ID in a single line. Note: before doing this, I'll have to surround the line by "<" and ">" chars.

So, I want to get

output.txt:

This looks simple, and I found some way of doing this using loops and arrays, but my solution looks ugly, and I want to ask: how would you solve this effectively and easily?

anubhava · Accepted Answer

Using awk:

awk -F_ '!a[$2]{b[++k]=$2} {a[$2]=a[$2] "<" $0 ">"}
            END {for (i=1; i<=k; i++) print a[b[i]]}' file

Uses 2 associative arrays: a where ID is key and value is all the corresponding lines
array b is used for keeping original order of keys only
For every line values of same key are joined together using a[$2] "<" $0 ">" expression.

Simplified version that doesn't keep ordering intact:

awk -F_ '{a[$2]=a[$2] "<" $0 ">"} END{for (i in a) print a[i]}' file

awk - collecting the similar lines in a single line

Answers (1)

Related Questions