Reputation: 169
I have two very large files (File 1 and File 2), File 1 has many rows and columns, I am pasting column 1 for sake of simplicity. I want to print only those lines which are unique to File 1.
File 1:
AT1G01010.1
AT1G01020_P1
AT1G01020_P2
AT1G01040.2
AT1G01040_P1
AT1G01046.1
AT1G01050_ID7
File 2:
AT1G01010
AT1G01046
AT1G01050
Output:
AT1G01020_P1
AT1G01020_P2
AT1G01040.2
AT1G01040_P1
I have tried comm
command in Ubuntu but it didn't work as it checks for complete pattern. so when it tries to check AT1G01010.1
with AT1G01010
it doesn't show anything common.
Upvotes: 3
Views: 3580
Reputation: 246774
grep
is the best answer.
With awk: uses non-alphanumeric characters as the field separator, remembers the contents of file 2 and if the first field of file1 has not been seen in file2, print that line.
gawk -F '[^[:alnum:]]' 'NR==FNR {f2[$1]; next} !($1 in f2)' file2 file1
Works with GNU awk.
Or join
join -v1 <(sed 's/^[[:alnum:]]\+/& &/' file1 | sort -k 1,1) <(sort file2) | cut -d " " -f 2-
Upvotes: 0
Reputation: 2761
Try:
grep -Fvf file2 file1
This will print the lines which no whole or partially matched with the lines in file2.
Upvotes: 7