Reputation: 845
I have the following TAB delimited file:
string1 string2 string3 001 string4
string5 string6 string7 002 string8
string9 string10 string11 003 string12
string13 string14 string15 002 string16
and I want to use awk to print all items in column 4 and print a list of matches of column 5 next to it (the number is the identifier)
001 string4
002 string8, string16
003 string12
my current attempt failed:
awk 'BEGIN{FS=OFS="\t"} $4 ~ /^K/ { print $4, print $5 }'
I also do not know how to implement to print me a list in column 2 of the matches.
Upvotes: 0
Views: 525
Reputation: 85885
Use Awk
as below,
awk 'BEGIN{FS=OFS="\t"}{unique[$4]=(unique[$4] FS $5); next}END{for (i in unique) print i,unique[i]}' file
which produces an output as below. Remember this does not retain the order, assuming it does not matter though.
002 string8 string16
003 string12
001 string4
If you are worried bout having comma separated values as in question, do it as
awk 'BEGIN{FS=OFS="\t"}{unique[$4]=(unique[$4]?(unique[$4]","$5):($5)); next}END{for (i in unique) print i,unique[i]}' file
to produce an output as
002 string8,string16
003 string12
001 string4
The idea is
Awk
process files one line at a time, hash-map array unique
is created with $4
being the index and value being $5
$5
is present per index, the values are appended to the existing value with a ,
de-limiter added. The ternary operator takes care of that, which works by seeing the array element has a value if so append the new value with ,
or if empty assign the $5
value directly.END
clause prints the formed hash-map, key and key-value which will get the value as needed.Upvotes: 1
Reputation: 133760
@tobi:@try:
awk 'FNR==NR{A[$4]=A[$4]?A[$4]","$NF:$NF;next} ($4 in A){print $4,A[$4];delete A[$4]}' Input_file Input_file
Checking FNR==NR(this condition will be true when 1st Input_file will be read), so making an array named A whose index is $4 and concatenating it's value with last column to it's own in first read of Input_file, next will leave all next statement. Then looping through in array A with $4 in it, printing the values.
Upvotes: 0