temp44
temp44

Reputation: 35

Regex to search text file for domains

I'm using the following code to find all domains (as best as I can) in a text file. Problem is it isn't finding any. I've tested the regex on regex101 and it was matching fine. Can anyone point out the problem? Tld.txt contains the full lowercase TLD list as I want to search for all of them.

Edit:
Tld.txt looks like this-

com
in

domains.txt looks like this-

mplay.google.co.in
play.google.com

Code

import re

with open("tld.txt", "r") as f:
    tld = f.read().splitlines()

with open("domains.txt","r") as f:
    domains = f.read().splitlines()
    for x in tld:
         regex = "^(.*?)"+str(x)
         for y in domains:
             domains_found = re.findall(regex, y)

print domains_found

Upvotes: 1

Views: 93

Answers (1)

sdocio
sdocio

Reputation: 142

You are printing the last result, since you are not adding results to domains_found, but replacing its contents. Have you just tried this?

import re
with open("tld.txt", "r") as f:
    tld = f.read().splitlines()
with open("domains.txt","r") as f:
    domains = f.read().splitlines()
    for x in tld:
         regex = "^(.*?)"+str(x)
         for y in domains:
             domains_found = re.findall(regex, y)
             print domains_found

Or better

domains_found.extend(re.findall(regex, y))

Upvotes: 1

Related Questions