rororo
rororo

Reputation: 845

AWK: search one column, print list of matches in second column

I have the following TAB delimited file:

string1 string2 string3 001 string4
string5 string6 string7 002 string8
string9 string10 string11 003 string12
string13 string14 string15 002 string16

and I want to use awk to print all items in column 4 and print a list of matches of column 5 next to it (the number is the identifier)

001 string4
002 string8, string16
003 string12

my current attempt failed: awk 'BEGIN{FS=OFS="\t"} $4 ~ /^K/ { print $4, print $5 }'

I also do not know how to implement to print me a list in column 2 of the matches.

Upvotes: 0

Views: 525

Answers (2)

Inian
Inian

Reputation: 85885

Use Awk as below,

awk 'BEGIN{FS=OFS="\t"}{unique[$4]=(unique[$4] FS $5); next}END{for (i in unique) print i,unique[i]}' file

which produces an output as below. Remember this does not retain the order, assuming it does not matter though.

002     string8 string16
003     string12
001     string4

If you are worried bout having comma separated values as in question, do it as

awk 'BEGIN{FS=OFS="\t"}{unique[$4]=(unique[$4]?(unique[$4]","$5):($5)); next}END{for (i in unique) print i,unique[i]}' file

to produce an output as

002 string8,string16
003 string12
001 string4

The idea is

  • Since Awk process files one line at a time, hash-map array unique is created with $4 being the index and value being $5
  • When more than one value of $5 is present per index, the values are appended to the existing value with a , de-limiter added. The ternary operator takes care of that, which works by seeing the array element has a value if so append the new value with , or if empty assign the $5 value directly.
  • The END clause prints the formed hash-map, key and key-value which will get the value as needed.

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133760

@tobi:@try:

awk 'FNR==NR{A[$4]=A[$4]?A[$4]","$NF:$NF;next} ($4 in A){print $4,A[$4];delete A[$4]}'   Input_file  Input_file

Checking FNR==NR(this condition will be true when 1st Input_file will be read), so making an array named A whose index is $4 and concatenating it's value with last column to it's own in first read of Input_file, next will leave all next statement. Then looping through in array A with $4 in it, printing the values.

Upvotes: 0

Related Questions