matt wilkie
matt wilkie

Reputation: 18104

How can I show lines in common (reverse diff)?

I have a series of text files for which I'd like to know the lines in common rather than the lines which are different between them. Command line Unix or Windows is fine.

File foo:

linux-vdso.so.1 =>  (0x00007fffccffe000)
libvlc.so.2 => /usr/lib/libvlc.so.2 (0x00007f0dc4b0b000)
libvlccore.so.0 => /usr/lib/libvlccore.so.0 (0x00007f0dc483f000)
libc.so.6 => /lib/libc.so.6 (0x00007f0dc44cd000)

File bar:

libkdeui.so.5 => /usr/lib/libkdeui.so.5 (0x00007f716ae22000)
libkio.so.5 => /usr/lib/libkio.so.5 (0x00007f716a96d000)
linux-vdso.so.1 =>  (0x00007fffccffe000)

So, given these two files above, the output of the desired utility would be akin to file1:line_number, file2:line_number == matching text (just a suggestion; I really don't care what the syntax is):

foo:1, bar:3 == linux-vdso.so.1 =>  (0x00007fffccffe000)

Upvotes: 232

Views: 141841

Answers (9)

seren
seren

Reputation: 316

This is pretty easy using diff's built-in ____-line-format options:

diff --old-line-format='' --unchanged-line-format='%L' --new-line-format='' old.txt new.txt

These options are also useful finding unique lines in one (or both) of the files. So to show lines in old.txt that aren't in new.txt:

diff --old-line-format='%L' old.txt new.txt

If want to compare the output of two commands or if you want to pre-filter files, you can use command substitution:

# Show common lines, excluding lines containing "badtext"
diff --unchanged-line-format='%L' <(grep -v "badtext" old.txt) <(grep -v "badtext" new.txt)
# Sort the files before showing the common lines:
diff --unchanged-line-format='%L' <(sort old.txt) <(sort new.txt)

Upvotes: 1

Gurjeet Singh
Gurjeet Singh

Reputation: 2817

I think diff utility itself, using its unified (-U) option, can be used to achieve effect. Because the first column of output of diff marks whether the line is an addition, or deletion, we can look for lines that haven't changed.

diff -U1000 file_1 file_2 | grep '^ '

The number 1000 is chosen arbitrarily, big enough to be larger than any single hunk of diff output.

Here's the full, foolproof set of commands:

f1="file_1"
f2="file_2"

lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))

diff -U$lcmax "$f1" "$f2" | grep '^ ' | less

# Alternatively, use this grep to ignore the lines starting
# with +, -, and @ signs.
#   grep -vE '^[+@-]'

If you want to include the lines that are just moved around, you can sort the input before diffing, like so:

f1="file_1"
f2="file_2"

lc1=$(wc -l "$f1" | cut -f1 -d' ')
lc2=$(wc -l "$f2" | cut -f1 -d' ')
lcmax=$(( lc1 > lc2 ? lc1 : lc2 ))

diff -U$lcmax <(sort "$f1") <(sort "$f2") | grep '^ ' | less

Upvotes: 12

ChristopheD
ChristopheD

Reputation: 116297

It was asked here before: Unix command to find lines common in two files

You could also try with Perl (credit goes here):

perl -ne 'print if ($seen{$_} .= @ARGV) =~ /10$/' file1 file2

Upvotes: 37

Dan Lew
Dan Lew

Reputation: 87430

On *nix, you can use comm. The answer to the question is:

comm -1 -2 file1.sorted file2.sorted 
# where file1 and file2 are sorted and piped into *.sorted

Here's the full usage of comm:

comm [-1] [-2] [-3 ] file1 file2
-1 Suppress the output column of lines unique to file1.
-2 Suppress the output column of lines unique to file2.
-3 Suppress the output column of lines duplicated in file1 and file2. 

Also note that it is important to sort the files before using comm, as mentioned in the man pages.

Upvotes: 270

Shrike
Shrike

Reputation: 66

In Windows, you can use a PowerShell script with CompareObject:

compare-object -IncludeEqual -ExcludeDifferent -PassThru (get-content A.txt) (get-content B.txt)> MATCHING.txt | Out-Null #Find Matching Lines

CompareObject:

  • IncludeEqual without -ExcludeDifferent: Everything
  • ExcludeDifferent without -IncludeEqual: Nothing

Upvotes: 1

Gopu
Gopu

Reputation: 1032

The easiest way to do it is:

awk 'NR==FNR{a[$1]++;next} a[$1] ' file1 file2

Files are not necessary to be sorted.

Upvotes: 14

Zivilyn Bane
Zivilyn Bane

Reputation: 71

Just for information, I made a little tool for Windows doing the same thing as "grep -F -x -f file1 file2" (As I haven't found anything equivalent to this command on Windows)

Here it is: http://www.nerdzcore.com/?page=commonlines

Usage is "CommonLines inputFile1 inputFile2 outputFile"

Source code is also available (GPL).

Upvotes: 1

Ryder
Ryder

Reputation: 1105

I found this answer on a question listed as a duplicate. I find grep to be more administrator-friendly than comm, so if you just want the set of matching lines (useful for comparing CSV files, for instance) simply use

grep -F -x -f file1 file2

Or the simplified fgrep version:

fgrep -xf file1 file2

Plus, you can use file2* to glob and look for lines in common with multiple files, rather than just two.

Some other handy variations include

  • -n flag to show the line number of each matched line
  • -c to only count the number of lines that match
  • -v to display only the lines in file2 that differ (or use diff).

Using comm is faster, but that speed comes at the expense of having to sort your files first. It isn't very useful as a 'reverse diff'.

Upvotes: 89

Greg Mueller
Greg Mueller

Reputation: 526

I just learned the comm command from the answers, but I wanted to add something extra: if the files are not sorted, and you don't want to touch the original files, you can pipe the output of the sort command. This leaves the original files intact. It works in Bash, but I can't say about other shells.

comm -1 -2 <(sort file1) <(sort file2)

This can be extended to compare command output, instead of files:

comm -1 -2 <(ls /dir1 | sort) <(ls /dir2 | sort)

Upvotes: 26

Related Questions