Reputation: 235
I am attempting to compare 2 files, A and B. The purpose is to find all the words A has but that are not in B. For example,
File A
my: 2
hello: 5
me: 1
File B
my
name
is
output
hello
me
The code I have so far is
inFile = "fila.txt"
lexicon = "fileb.xml"
with open(inFile) as f:
content = f.readlines()
content = [x.strip() for x in content]
with open(lexicon) as File:
lexicon_file = File.readlines()
lexicon_file = [x.strip() for x in lexicon_file]
ordered_dict = {}
for line in content:
key = line.split(":")[0].strip()
value = int(line.split(":")[1].strip())
ordered_dict[key] = value
for entry in lexicon_file:
for (key, val) in ordered_dict.items():
if entry == key:
continue
else:
print(key)
However this takes too long because it's in double loops, it's also printing duplicate words. How do I make this efficient?
Upvotes: 1
Views: 438
Reputation: 141
Convert both lists into sets and just do a substraction:
content_wo_lexicon = list(set(content) - set(lexicon_content))
Upvotes: 3