Kab2k
Kab2k

Reputation: 301

Can't apply a value to a list

I'm trying to store the emails gotten from certain links. I encounter two problems. The first one is that for some reason the element email stores two of the same type of item. And the second problem is that the if statement detects that email has a value but it doesn't store it in the emails list. Thank you for helping out!

emails = []
comment = []

with open('comment.txt', 'r') as filehandle:
    for line in filehandle:
        currentPlace = line[:-1]
        comment.append(currentPlace)

print(emails)

i = 0
while i < len(comment) :
    url = str(comment[i]) + '/about'

    print("Crawling URL %s" % url)
    response = requests.get(url)

    email = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I)

    print(email)

    if email:
        emails.append(email)

    email.clear()

    i += 1
    time.sleep(0.2)

print(emails)

Output:

[]
Crawling URL ...
['[email protected]', '[email protected]']
Crawling URL ...
[]
Crawling URL ...
['[email protected]', '[email protected]']
Crawling URL ...
[]
Crawling URL ...
[]
[[], []]

old code outputs correctly:

emails = set()
print("Crawling URL %s" % starting_url)

response = requests.get(starting_url)

new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I))
emails.update(new_emails)
print(emails)
# create a beutiful soup for the html document
soup = BeautifulSoup(response.text, 'lxml')

Upvotes: 1

Views: 135

Answers (1)

Tom Dalton
Tom Dalton

Reputation: 6190

https://docs.python.org/3/library/re.html#re.findall this returns the list of all matches of your regular expression. So the regular expression is finding 2 matches for your email regexp.

You then do emails.append(email). But email is itself a list of emails. So your emails list ends up looking like [["[email protected]","[email protected]"], ["[email protected]","[email protected]"], ... ].

Upvotes: 2

Related Questions