Regex to search text file for domains

Question

I'm using the following code to find all domains (as best as I can) in a text file. Problem is it isn't finding any. I've tested the regex on regex101 and it was matching fine. Can anyone point out the problem? Tld.txt contains the full lowercase TLD list as I want to search for all of them.

Edit:
Tld.txt looks like this-

com
in

domains.txt looks like this-

mplay.google.co.in
play.google.com

Code

import re

with open("tld.txt", "r") as f:
    tld = f.read().splitlines()

with open("domains.txt","r") as f:
    domains = f.read().splitlines()
    for x in tld:
         regex = "^(.*?)"+str(x)
         for y in domains:
             domains_found = re.findall(regex, y)

print domains_found

sdocio · Accepted Answer

You are printing the last result, since you are not adding results to domains_found, but replacing its contents. Have you just tried this?

import re
with open("tld.txt", "r") as f:
    tld = f.read().splitlines()
with open("domains.txt","r") as f:
    domains = f.read().splitlines()
    for x in tld:
         regex = "^(.*?)"+str(x)
         for y in domains:
             domains_found = re.findall(regex, y)
             print domains_found

Or better

domains_found.extend(re.findall(regex, y))

Regex to search text file for domains

Answers (1)

Related Questions