Waqas Khokhar
Waqas Khokhar

Reputation: 169

Find unique lines between two files

I have two very large files (File 1 and File 2), File 1 has many rows and columns, I am pasting column 1 for sake of simplicity. I want to print only those lines which are unique to File 1.

File 1:

AT1G01010.1
AT1G01020_P1
AT1G01020_P2
AT1G01040.2
AT1G01040_P1
AT1G01046.1
AT1G01050_ID7

File 2:

AT1G01010
AT1G01046
AT1G01050

Output:

AT1G01020_P1
AT1G01020_P2
AT1G01040.2
AT1G01040_P1

I have tried comm command in Ubuntu but it didn't work as it checks for complete pattern. so when it tries to check AT1G01010.1 with AT1G01010 it doesn't show anything common.

Upvotes: 3

Views: 3580

Answers (2)

glenn jackman
glenn jackman

Reputation: 246774

grep is the best answer.

With awk: uses non-alphanumeric characters as the field separator, remembers the contents of file 2 and if the first field of file1 has not been seen in file2, print that line.

gawk -F '[^[:alnum:]]' 'NR==FNR {f2[$1]; next} !($1 in f2)' file2 file1

Works with GNU awk.

Or join

join -v1 <(sed 's/^[[:alnum:]]\+/& &/' file1 | sort -k 1,1) <(sort file2) | cut -d " " -f 2-

Upvotes: 0

αғsнιη
αғsнιη

Reputation: 2761

Try:

grep -Fvf file2 file1

This will print the lines which no whole or partially matched with the lines in file2.

Upvotes: 7

Related Questions