mwra
mwra

Reputation: 317

how to grep exact string match across 2 files

I've UTF-8 plain text lists of usernames, 1 per line, in list1.txt and list2.txt. Note, in case pertinent, that usernames may contain regex characters e.g. ! ^ . ( and such as well as spaces.

I want to get and save to matches.txt a list of all unique values occurring in both lists. I've little command line expertise but this almost gets me there:

grep -Ff list1.txt list2.txt > matches.txt

...but that is treating "jdoe" and "jdoe III" as a match, returning "jdoe III" as the matched value. This is incorrect for the task. I need the per-line pattern match to be the whole line, i.e. from ^ to $. I've tried adding the -x flag but that gets no matches at all (edit: see comment to accepted answer - I got the flag order wrong).

I'm on OS X 10.9.5 and I don't have to use grep - another command line (tool) solving the problem will do.

Upvotes: 0

Views: 4479

Answers (4)

sane
sane

Reputation: 125

grep -Fwf file1 file2 would match word to word !!

Upvotes: 0

Accidental brine
Accidental brine

Reputation: 369

A very simple and straightforward way to do it that doesn't require one to do all sorts of crazy things with grep is as follows

cat list1.txt list2.txt|grep match > matches.txt
Not only that, but it's also easier to remember, (especially if you regularly use cat).

Upvotes: 0

Adam Katz
Adam Katz

Reputation: 16176

All you need to do is add the -x flag to your grep query:

grep -Fxf list1.txt list2.txt > matches.txt

The -x flag will restrict matches to full line matches (each PATTERN becomes ^PATTERN$). I'm not sure why your attempt at -x failed. Maybe you put it after the -f, which must be immediately followed by the first file?

Upvotes: 2

anubhava
anubhava

Reputation: 785471

This awk will be handy than grep here:

awk 'FNR==NR{a[$0]; next} $0 in a' list1.txt list2.txt > matches.txt

$0 is the line, FNR is the current line number of the current file, NR is the overall line number (they are only the same when you are on the first file). a[$0] is a associative array (hash) whose key is the line. next will ensure that further clauses (the $0 in a) will not run if the current clause (the fact that this is the first file) did. $0 in a will be true when the current line has a value in the array a, thus only lines present in both will be displayed. The order will be their order of occurence in the second file.

Upvotes: 1

Related Questions