Reputation: 11
I have two text files, file1.txt and file2.txt.
file1.txt contains a list of numbers. file2.txt also contains a list of numbers, but more of them (a good chunk are numbers from file1.txt). This is what I am trying to do:
I want to remove all the numbers in file1.txt from file2.txt and have the output saved to file3.txt. So in file3.txt, it will contain no numbers from file1.txt. How can I accomplish this?
Upvotes: 1
Views: 2996
Reputation: 47099
You want to only print unique elements of file2.txt. This is what the comm
utility is designed for:
comm -13 <(sort file1.txt) <(sort file2.txt)
Testing
$ cat file1.txt
5
4
6
2
10
$ cat file2.txt
3
7
8
2
4
1
9
10
5
6
$ comm -13 <(sort file1.txt) <(sort file2.txt)
1
3
7
8
9
Upvotes: 1
Reputation: 753615
With GNU grep
, you can use the 'fgrep
' mode:
grep -F -v -f file1.txt -w file2.txt > file3.txt
Demo:
seq 1 30 > file2.txt
for i in 1 2 3 4 5; do echo $RANDOM; done | sed 's/\(..\).*/\1/' > file1.txt
grep -F -v -f file1.txt -w file2.txt > file3.txt
The contents of file2.txt
is lines with numbers 1 through 30. The content of file1.txt
is 5 semi-random 2-digit numbers. The output in file3.txt
is the lines in file 2 that are not in file 1. Note that the random number generated by the loop are not very good, nor constrained to 1..30 (see also comments just below).
The feature that is specific to GNU grep
is the -w
flag, which matches whole words. Interestingly, POSIX 2008 specifies that -x
should match exact lines, and the -x
option works correctly for me (on Mac OS X 10.7.5, but /usr/bin/grep
is GNU grep 2.5.1). In theory, the -x
is more portable. Since it was in the POSIX 1997 standard too, it should be widely available. The -w
option would be more appropriate if there were multiple numbers on a single line (but grep
would eliminate whole lines).
Upvotes: 4
Reputation: 54392
Here's one way using awk
:
awk 'FNR==NR { a[$0]; next } !($0 in a)' file1.txt file2.txt > file3.txt
This reads file1 into an array, then when iterating through file2, it will print lines of file2 that are not in the array and write them to an output file. If you have any questions, don't hesitate to ask. Cheers.
Upvotes: 6
Reputation: 824
Can you give a little more information about how these numbers are formatted? Are each of them on a new line? Are they all the same number of digits?
EDIT: After receiving comment:
while read line
do
bool="false"
while read secLine
do
if [ "$line" == "$secLine" ]
then
bool="true"
fi
done <file1
if [ "$bool" == "false" ]
then
echo $line >> file3.txt
fi
done <file2
That will work, albeit by brute force (or it should work. Check for syntax errors. I didn't see any but there may be some.) It may take awhile depending on how many numbers you have.
Upvotes: 0
Reputation: 5091
You can use the unix "diff" command for get the difference and filter out unwanted lines. You can use --changed-group-format and --unchanged-group-format options to filter required data.
Following three options can use to select the relevant group for each option:
'%<' get lines from FILE1
'%>' get lines from FILE2
'' (empty string) for removing lines from both files.
e.g:
diff --changed-group-format="%>" --unchanged-group-format="" file1.txt file2.txt > file3.txt
Upvotes: 1