Reputation: 33

How can I use awk to only print values in a given row of file1 if they match with a list in column 1 in file2?

I need to remove items after the second column of file1 if they are not in a list of items in file2, which is just 1 column. Space delimiters are used.

$ cat file1
ID1 item1 item2 item3 item4 item5 item6 
ID2 item4
ID3 item1 item5 item6
ID4 item2 item3

$ cat file2
item1
item3 
item6

Desired Output:

ID1 item1 item3 item6
ID2
ID3 item1 item6
ID4 item3

I have tried all day with multiple codes to get this to work. The one that seems the simplest is here:

awk -F'[ ]' '
{
    s = $1
    seen[$1]++
    for(i=2; i<=NF; i++)
            if ($1 in seen[$i]) s = s " " $i
    print s
    delete seen
}
' file1 file2

I just end up with:

awk: cmd. line:6: (FILENAME=output6.o FNR=1) fatal: attempt to use a scalar value as array

Upvotes: 1

Answers (2)

David C. Rankin

Reputation: 84531

A slightly different approach that only stores records from file2 and simply loops over fields in file1 comparing contents from field-2 on could be:

awk '
  NR == FNR {
    a[$1] = 1
    next
  }
  {
    for (i=1; i<=NF; i++) {
      if (i < 2 || $i in a)
        printf "%s%s", (i>1) ? OFS : "", $i
    } 
    print ""
  }
' file2 file1

Example Output

ID1 item1 item3 item6
ID2
ID3 item1 item6
ID4 item3

Upvotes: 1

markp-fuso

Reputation: 34054

For this particular case I find that keep items is a bit easier to code than remove items, eg:

awk '
FNR==NR { keep[$1]; next }
        { out=$1
          for (i=2;i<=NF;i++) 
              if ($i in keep)
                 out=out OFS $i
          print out
        }
' file2 file1

This generates:

ID1 item1 item3 item6
ID2
ID3 item1 item6
ID4 item3

Upvotes: 1

How can I use awk to only print values in a given row of file1 if they match with a list in column 1 in file2?

Answers (2)

Related Questions