Reputation: 1417
I have a file:
input.txt:
a_1_bcd
ab_1_e
i_2_gxyz
la_3_df
de_3_fg
ff_3_hi
I treat a part between the first and second underscores as ID and I want to put all the lines sharing the same ID in a single line. Note: before doing this, I'll have to surround the line by "<" and ">" chars.
So, I want to get
output.txt:
<a_1_bcd><ab_1_e>
<i_2_gxyz>
<la_3_df><de_3_fg><ff_3_hi>
This looks simple, and I found some way of doing this using loops and arrays, but my solution looks ugly, and I want to ask: how would you solve this effectively and easily?
Upvotes: 2
Views: 84
Reputation: 785196
Using awk
:
awk -F_ '!a[$2]{b[++k]=$2} {a[$2]=a[$2] "<" $0 ">"}
END {for (i=1; i<=k; i++) print a[b[i]]}' file
<a_1_bcd><ab_1_e>
<i_2_gxyz>
<la_3_df><de_3_fg><ff_3_hi>
b
is used for keeping original order of keys onlya[$2] "<" $0 ">"
expression.Simplified version that doesn't keep ordering intact:
awk -F_ '{a[$2]=a[$2] "<" $0 ">"} END{for (i in a) print a[i]}' file
<i_2_gxyz>
<la_3_df><de_3_fg><ff_3_hi>
<a_1_bcd><ab_1_e>
Upvotes: 3