Reputation: 31
File 1 is formatted like this:
1111111111
2222222222
File 2 is formatted like this:
3333333333:4444444444
1111111111:2222222222
I am trying to figure out a way to take the stuff in file one and see if it matches with only the stuff to the right of the colons in file two. The ultimate goal is to remove the FULL line in file two if there is a match.
I know I can cut file 2 using standard commands so they're formatted exactly the same. The problem is I need the finished file in 88888:99999 format and it seems too complicated to split them up only to put them back in the right order.
I've tried nesting for loops, regex, sets, lists, my head is spinning.
I hope this makes sense. Thanks in advance.
Traceback (most recent call last):
File "test.py", line 17, in <module>
if line.split(":")[1] in keys:
IndexError: list index out of range
Upvotes: 3
Views: 200
Reputation: 160015
Assuming that you want to remove lines in file 2 if the second part of the line matches up with any value in file 1 you would do something like this:
# Warning: Untested code ahead
with open("file1", "r") as f1:
# First, get the set of all the values in file 1
# Sets use hash tables under the covers so this should
# be fast enough for our use case (assuming sizes less than
# the total memory available on the system)
keys = set(f1.read().splitlines())
# Since we can't write back into the same file as we read through it
# we'll pipe the valid lines into a new file
with open("file2", "r") as f2:
with open("filtered_file", "w") as dest:
for line in f2:
line = line.strip() # Remove newline
# ASSUMPTION: All lines in file 2 have a colon
if line.split(":")[1] in keys:
continue
else:
dest.writeline(line)
Upvotes: 3
Reputation: 1120
This is how you can get the elements right to colon in file 2. Maybe not the cleanest one, but you get the idea.
str2 = open(file2).read()
righttocolon = [s.split(":")[1] for s in [ln for ln in str2.split("\n")] if len(s.split(":")) == 2]
Upvotes: 0