Reputation: 21
I have two files. i am trying to remove any lines in file2 when they match values found in file1. One file has a listing like so:
File1
ZNI008
ZNI009
ZNI010
ZNI011
ZNI012
... over 19463 lines
The second file includes lines that match the items listed in first: File2
copy /Y \\server\foldername\version\20050001_ZNI008_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI010_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI012_162635.xml \\server\foldername\version\folder\
copy /Y \\server\foldername\version\20050001_ZNI009_162635.xml \\server\foldername\version\folder\
... continues listing until line 51360
What I've tried so far:
grep -v -i -f file1.txt file2.txt > f3.txt
does not produce any output to f3.txt
or remove any lines. I verified by running
wc -l file2.txt
and the result is
51360 file2.txt
I believe the reason is that there are no exact matches. When I run the following it shows nothing
comm -1 -2 file1.txt file2.txt
Running
( tr '\0' '\n' < file1.txt; tr '\0' '\n' < file2.txt ) | sort | uniq -c | egrep -v '^ +1'
shows only one match, even though I can clearly see there is more than one match.
Alternatively putting all the data into one file and running the following:
grep -Ev "$(cat file1.txt)" 1>LinesRemoved.log
says argument has too many lines to process.
I need to remove lines matching the items in file1 from file2.
i am also trying this in python: `
#!/usr/bin/python
s = set()
# load each line of file1 into memory as elements of a set, 's'
f1 = open("file1.txt", "r")
for line in f1:
s.add(line.strip())
f1.close()
# open file2 and split each line on "_" separator,
# second field contains the value ZNIxxx
f2 = open("file2.txt", "r")
for line in f2:
if line[0:4] == "copy":
fields = line.split("_")
# check if the field exists in the set 's'
if fields[1] not in s:
match = line
else:
match = 0
else:
if match:
print match, line,
`
it is not working well.. as im getting 'Traceback (most recent call last): File "./test.py", line 14, in ? if fields[1] not in s: IndexError: list index out of range'
Upvotes: 2
Views: 5348
Reputation: 572
I like the grep solution from byrondrossos better, but here's another option:
sed $(awk '{printf("-e /%s/d ", $1)}' file1) file2 > file3
Upvotes: 1
Reputation: 278
This is admittedly ugly but it does work. However, the path must be the same for all of the (except of course the ZNI### portion). All but the ZNI### of the path is removed so the command grep -vf can run correctly on the sorted files.
First Convert "testfile2" to "testfileconverted" to just show the ZNI###
cat /testfile2 | sed 's:^.*_ZNI:ZNI:g' | sed 's:_.*::g' > /testfileconverted
Second use inverse grep of the converted file compared to the "testfile1" and add the reformatted output to "testfile3"
bash -c 'grep -vf <(sort /testfileconverted) <(sort /testfile1)' | sed "s:^:\copy /Y \\\|server\\\foldername\\\version\\\20050001_:g" | sed "s:$:_162635\.xml \\\|server\\\foldername\\\version\\\folder\\\:g" | sed "s:|:\\\:g" > /testfile3
Upvotes: 0
Reputation: 17594
this is using Bash and GNU sed because of the -i
switch
cp file2 file3
while read -r; do
sed -i "/$REPLY/d" file3
done < file1
there is surely a better way but here's a hack around -i
:D
cp file2 file3
while read -r; do
(rm file3; sed "/$REPLY/d" > file3) < file3
done < file1
this exploits shell evaluation order
alright, I guess the correct way with this idea is using ed
. This should be POSIX too.
cp file2 file3
while read -r line; do
ed file3 <<EOF
/$line/d
wq
EOF
done < file1
in any case, grep
seems to do be the right tool for the job.
@byrondrossos answer should work for you well ;)
Upvotes: 0