Reputation: 15
I am running this program to basically get the page source code of a website I put in. It saves it to a file and what I want is it to look for a specific string which is basically @ for the emails. However, I can't get it to work.
import requests
import re
url = 'https://www.youtube.com/watch?v=GdKEdN66jUc&app=desktop'
data = requests.get(url)
# dump resulting text to file
with open("data6.txt", "w") as out_f:
out_f.write(data.text)
with open("data6.txt", "r") as f:
searchlines = f.readlines()
for i, line in enumerate(searchlines):
if "@" in line:
for l in searchlines[i:i+3]: print((l))
Upvotes: 0
Views: 76
Reputation: 15652
You can use the regex method findall
to find all email addresses in your text content, and use file.read()
instead of file.readlines()
. To get all content together rather than split into separate lines.
For example:
import re
with open("data6.txt", "r") as file:
content = file.read()
emails = re.findall(r"[\w\.]+@[\w\.]+", content)
Maybe cast to a set for uniqueness afterwards, and then save to a file however you like.
Upvotes: 2