Joey Joestar
Joey Joestar

Reputation: 235

How to compare individual words in two text files in Python

I am attempting to compare 2 files, A and B. The purpose is to find all the words A has but that are not in B. For example,

File A

my: 2
hello: 5
me: 1

File B

my
name
is

output

hello
me

The code I have so far is

inFile = "fila.txt"
lexicon = "fileb.xml"

with open(inFile) as f:
    content = f.readlines()

content = [x.strip() for x in content]

with open(lexicon) as File:
    lexicon_file = File.readlines()

lexicon_file = [x.strip() for x in lexicon_file]

ordered_dict = {}

for line in content:
    key = line.split(":")[0].strip()
    value = int(line.split(":")[1].strip())
    ordered_dict[key] = value

for entry in lexicon_file:
    for (key, val) in ordered_dict.items():
        if entry == key:
            continue
        else:
            print(key)

However this takes too long because it's in double loops, it's also printing duplicate words. How do I make this efficient?

Upvotes: 1

Views: 438

Answers (1)

dpkandy
dpkandy

Reputation: 141

Convert both lists into sets and just do a substraction:

content_wo_lexicon = list(set(content) - set(lexicon_content))

Upvotes: 3

Related Questions