Reputation: 35
I'm using the following code to find all domains (as best as I can) in a text file. Problem is it isn't finding any. I've tested the regex on regex101 and it was matching fine. Can anyone point out the problem? Tld.txt contains the full lowercase TLD list as I want to search for all of them.
Edit:
Tld.txt looks like this-
com in
domains.txt looks like this-
mplay.google.co.in play.google.com
Code
import re
with open("tld.txt", "r") as f:
tld = f.read().splitlines()
with open("domains.txt","r") as f:
domains = f.read().splitlines()
for x in tld:
regex = "^(.*?)"+str(x)
for y in domains:
domains_found = re.findall(regex, y)
print domains_found
Upvotes: 1
Views: 93
Reputation: 142
You are printing the last result, since you are not adding results to domains_found
, but replacing its contents. Have you just tried this?
import re
with open("tld.txt", "r") as f:
tld = f.read().splitlines()
with open("domains.txt","r") as f:
domains = f.read().splitlines()
for x in tld:
regex = "^(.*?)"+str(x)
for y in domains:
domains_found = re.findall(regex, y)
print domains_found
Or better
domains_found.extend(re.findall(regex, y))
Upvotes: 1