Reputation: 193
First part of the script is OK (its removes http://
and www.
). Later I need to check if the words inside source are presents in exists.
source = open('/net/sign/temp/python_tmp/script1/source.txt','r')
exists = open('/net/sign/temp/python_tmp/script1/exists.txt','r')
with source as f:
lines = f.read()
lines = lines.replace('http://','')
lines = lines.replace('www.','')
for a in open('/net/sign/temp/python_tmp/script1/exists.txt'):
if a == lines:
print("ok")
The content of source.txt
:
www.yahoo.it
www.yahoo.com
www.google.com
http://www.libero.it
The content of exists.txt
:
www.yahoo.com
Upvotes: 3
Views: 252
Reputation: 78690
Ok, judging from your example files what you are actually trying to do is find the lines which both text files share. If your files are not gigantic, a simple solution would be to read in the files and compute the intersection of their sets of lines.
>>> with open('source.txt') as s, open('exists.txt') as e:
... result = set(s).intersection(e)
...
>>> result
set(['www.yahoo.com\n'])
You can replace 'http://'
and 'www.'
afterwards with
result = [x.replace('http://', '').replace('www.', '') for x in result]
if you want to.
Upvotes: 3
Reputation: 5611
Something like this should work:
source_words = set()
with open('source.txt') as source:
for word in source.readlines():
source_words.add(word.replace('http://','').replace('www.','').strip())
exist_words = set()
with open('exist.txt') as exist:
for word in exist.readlines():
exist_words.add(word.replace('http://','').replace('www.','').strip())
print("There {} words from 'source.txt' in 'exists.txt'".format(
"are" if exist_words.intersection(source_words) else "aren't"
))
If you need to get exact words which are present in both files, they are in the intersection result:
print("These words are in both files:")
for word in exist_words.intersection(source_words):
print(word)
Upvotes: 4