Reputation: 10689
I have two files with sorted lines. One file (B) is a subset of the other file (A). I would like to find all lines in A that ARE NOT in B. Ideally, I would like to create a file (C) that contains these lines. Is this possible in Unix? I'm looking for a one line command to do this instead of writing a script. I looked at the join
and diff
commands, but I could not find a command option to do this. Thanks for the help.
Upvotes: 6
Views: 2851
Reputation: 360065
This join
command will do what you're asking:
join -v 1 fileA fileB > fileC
Demonstration:
$ cat fileA
a
c
d
g
h
t
u
v
z
$ cat fileB
a
d
g
t
u
z
$ join -v 1 fileA fileB
c
h
v
This assumes sorted files as you stated in your question. For unsorted files:
join -v 1 <(sort fileA) <(sort fileB)
Upvotes: 1
Reputation: 2497
Awk Solution
Input files
a
aaa
bbb
ccc
b
ccc
ddd
eel
Code
awk ' NR==FNR { A[$0]=1; next; }
{ if ($0 in A) { A[$0]=0; } }
END { for (k in A) { if (A[k]==1) { print k; } } } ' a b > c
c (Output file)
bbb
aaa
Upvotes: 0
Reputation: 51147
You can do this with diff as well. Diff (unlike @johlo's grep answer) cares about order, works on non-sorted files (unlike @johnshen64's comm answer) :
$ cat a
a
b
c
d
e
$ cat b
a
b
f
d
e
$ diff -dbU0 a b
--- a 2012-05-18 16:02:30.603386016 -0400
+++ b 2012-05-18 16:02:45.547817122 -0400
@@ -3 +3 @@
-c
+f
So you can use a pipeline to get just the omitted lines—considering order:
$ diff -dbU0 a b | tail -n +4 | grep ^- | cut -c2-
c
Upvotes: 3