Nir
Nir

Reputation: 914

How to diff two file lists and ignoring place in list

I have two lists of files which I want to diff. The second list has more files in it, and because they are all in alphabetical order when I diff these two lists I get files (lines) that exists in both lists, but in a different place.

I want to diff these two lists, ignoring line place in the list. This way I would get only the new or missing lines in the list.

Thank you.

Upvotes: 17

Views: 31310

Answers (6)

No One in Particular
No One in Particular

Reputation: 2894

Do the following:

cat file1 file2 | sort | uniq -u

This will give you a list of lines which are unique (ie, not duplicated).

Explanation:
1) cat file1 file2 will put all of the entries into one list
2) sort will sort the combined list
3) uniq -u will only output the entries which don't have duplicates

Upvotes: 17

antak
antak

Reputation: 20759

The deft command to use here is the humble comm command:

To demonstrate, let's create two input files:

$ cat <<EOF >a
> a.txt
> b.txt
> c.txt
> EOF

$ cat <<EOF >b
> a.txt
> a1.txt
> b.txt
> b2.txt
> EOF

Now, using the comm command to get what the question wanted:

$ comm -2 a b
        a.txt
        b.txt
c.txt

This shows a columnar output with missing files (lines in a but not in b) in the first column and extra files (lines in b but not in a) in the second column.

What exactly does comm do?

Here's the output if the command is typed without any switches:

$ comm a b
                a.txt
        a1.txt
                b.txt
        b2.txt
c.txt

This shows three columns thus:

  1. Lines in a but not in b
  2. Lines in both a and b
  3. Lines in b but not in a

What the numbered switches -123 do is it hides the specified column from the output.

So for example:

  • Specifying -13 results in common lines only
  • Specifying -12 results in lines only in b
  • Specifying -23 results in lines only in a
  • Specifying -2 results in the symmetric difference
  • Specifying -123 results in no output

Upvotes: 9

dogbane
dogbane

Reputation: 274582

You can try this approach which involves "subtracting" the two lists as follows:

$ cat file1
a.txt
b.txt
c.txt

$ cat file2
a.txt
a1.txt
b.txt
b2.txt

1) print everything in file2 that is not in file1 i.e. file2 - file1

$ grep -vxFf file1 file2
a1.txt
b2.txt

2) print everything in file1 that is not in file2 i.e. file1 - file2

$ grep -vxFf file2 file1
c.txt

(You can then do what you want with these diffs e.g. write to file, sort etc)

grep options descriptions:

  -v, --invert-match        select non-matching lines
  -x, --line-regexp         force PATTERN to match only whole lines
  -F, --fixed-strings       PATTERN is a set of newline-separated strings
  -f, --file=FILE           obtain PATTERN from FILE

Upvotes: 24

Beano
Beano

Reputation: 7841

For the example you quotes @Sparr

a contains

a.txt
b.txt
c.txt

b contains

a.txt
a1.txt
b.txt
b2.txt

diff a b gives

1a2
> a1.txt
3c4
< c.txt
---
> b2.txt

What is it about this output that does not meet your needs?

Upvotes: 4

pyfunc
pyfunc

Reputation: 66709

Sorting the two list before you diff them will provide a more useful diff data.

Upvotes: 1

Sparr
Sparr

Reputation: 7712

If the lines are sorted, diff should catch the insertions and deletions just fine and only report the differences.

Upvotes: 0

Related Questions