Reputation: 33
I need to remove items after the second column of file1 if they are not in a list of items in file2, which is just 1 column. Space delimiters are used.
$ cat file1
ID1 item1 item2 item3 item4 item5 item6
ID2 item4
ID3 item1 item5 item6
ID4 item2 item3
$ cat file2
item1
item3
item6
Desired Output:
ID1 item1 item3 item6
ID2
ID3 item1 item6
ID4 item3
I have tried all day with multiple codes to get this to work. The one that seems the simplest is here:
awk -F'[ ]' '
{
s = $1
seen[$1]++
for(i=2; i<=NF; i++)
if ($1 in seen[$i]) s = s " " $i
print s
delete seen
}
' file1 file2
I just end up with:
awk: cmd. line:6: (FILENAME=output6.o FNR=1) fatal: attempt to use a scalar value as array
Upvotes: 1
Views: 99
Reputation: 84531
A slightly different approach that only stores records from file2
and simply loops over fields in file1
comparing contents from field-2 on could be:
awk '
NR == FNR {
a[$1] = 1
next
}
{
for (i=1; i<=NF; i++) {
if (i < 2 || $i in a)
printf "%s%s", (i>1) ? OFS : "", $i
}
print ""
}
' file2 file1
Example Output
ID1 item1 item3 item6
ID2
ID3 item1 item6
ID4 item3
Upvotes: 1
Reputation: 34054
For this particular case I find that keep items
is a bit easier to code than remove items
, eg:
awk '
FNR==NR { keep[$1]; next }
{ out=$1
for (i=2;i<=NF;i++)
if ($i in keep)
out=out OFS $i
print out
}
' file2 file1
This generates:
ID1 item1 item3 item6
ID2
ID3 item1 item6
ID4 item3
Upvotes: 1