lyrically wicked
lyrically wicked

Reputation: 1417

awk - collecting the similar lines in a single line

I have a file:

input.txt:

a_1_bcd
ab_1_e
i_2_gxyz
la_3_df
de_3_fg
ff_3_hi

I treat a part between the first and second underscores as ID and I want to put all the lines sharing the same ID in a single line. Note: before doing this, I'll have to surround the line by "<" and ">" chars.

So, I want to get

output.txt:

<a_1_bcd><ab_1_e>
<i_2_gxyz>
<la_3_df><de_3_fg><ff_3_hi>

This looks simple, and I found some way of doing this using loops and arrays, but my solution looks ugly, and I want to ask: how would you solve this effectively and easily?

Upvotes: 2

Views: 84

Answers (1)

anubhava
anubhava

Reputation: 785196

Using awk:

awk -F_ '!a[$2]{b[++k]=$2} {a[$2]=a[$2] "<" $0 ">"}
            END {for (i=1; i<=k; i++) print a[b[i]]}' file
<a_1_bcd><ab_1_e>
<i_2_gxyz>
<la_3_df><de_3_fg><ff_3_hi>
  • Uses 2 associative arrays: a where ID is key and value is all the corresponding lines
  • array b is used for keeping original order of keys only
  • For every line values of same key are joined together using a[$2] "<" $0 ">" expression.

Simplified version that doesn't keep ordering intact:

awk -F_ '{a[$2]=a[$2] "<" $0 ">"} END{for (i in a) print a[i]}' file
<i_2_gxyz>
<la_3_df><de_3_fg><ff_3_hi>
<a_1_bcd><ab_1_e>

Upvotes: 3

Related Questions