Reputation: 125
I have two files test1.txt and test2.txt
test1.txt contains
abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt
and test2.txt contains
12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt
I want to compare these two files and give me output something like this in bash
In both:
abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
Only in test1.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt
Only in test2.txt
10111.2222.txt
Upvotes: 2
Views: 357
Reputation: 4238
The following AWK script script.awk
also does the job:
NR == FNR { lines[++i] = $0 }
NR > FNR { patterns[++j] = $0 }
END {
for (p_index in patterns)
for (l_index in lines)
if (index(lines[l_index], patterns[p_index]) > 0) {
lines_match[l_index] = 1
patterns_match[p_index] = 1
}
print "Lines only in first file:"
for (l_index in lines)
if (!(l_index in lines_match))
print lines[l_index]
print "Lines only in second file:"
for (p_index in patterns)
if (! (p_index in patterns_match))
print patterns[p_index]
print "Lines in both files:"
for (l_index in lines)
if (l_index in lines_match)
print lines[l_index]
}
It can be called as follows:
awk -f script.awk test1.txt test2.txt
Note that the script does not make any assumptions about the structure of the data in the two files. It simply assumes that the lines in test2.txt
are potential substrings of the lines in test1.txt
.
Upvotes: 0
Reputation: 1296
This formulation might be solved using comm
from GNU Coreutils:
Sort second file at first:
sort -o test2.txt test2.txt;
Then use commands to show lines:
# unique to test1.txt
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -23 - test2.txt
# unique to test2.txt
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -13 - test2.txt
# that appear in both files
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -12 - test2.txt
Explanation:
# 1. Extract all but first four fields from test1.txt
cut -d '.' -f 1-4 --complement test1.txt
# 2. Here '-' replaces standard input
comm -3 - test2.txt
Upvotes: 0
Reputation: 5298
File1 :
abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt
File2 :
12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt
#!/bin/bash
if [ -e Both.txt ]
then
rm Both.txt
fi
if [ -e File1.txt ]
then
rm File1.txt
fi
if [ -e File2.txt ]
then
rm File2.txt
fi
while read f2line
do
found=0
while read f1line
do
Both=`echo "$f1line" | grep "$f2line"`
if [ $? -eq 0 ]
then
found=1
echo $Both >> Both.txt
fi
done < File1
if [ $found -eq 0 ]
then
echo $f2line >> File2.txt
fi
done < File2
sort Both.txt > s_Both.txt
sort File1 > s_File1
comm -3 s_File1 s_Both.txt > File1.txt
rm s_File1
rm s_Both.txt
Output Files: Both.txt, File1.txt, File2.txt
Upvotes: 0
Reputation: 88646
In both:
grep -f text2.txt text1.txt
Output:
abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
grep -v -f text2.txt text1.txt
Output:
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt
grep -v -f <( grep -Eo '[0-9]+.[0-9]+.txt' text1.txt) text2.txt
Output:
10111.2222.txt
Upvotes: 3