Reputation: 541
I am using imap_tools to get links from emails. The emails are very small with very little text, graphics, etc. There are also not many, around 20-40 spread through the day.
When a new email arrives it takes between 10 and 25 seconds to scrape the link. This seems very long. I would have expected it to be less than 2 seconds and speed is important.
Nb. it is a shared mailbox and I cannot simply fetch unseeen emails because often other users will have opened emails before the scraper gets to them.
Can anyone see what the issue is?
import pandas as pd
from imap_tools import MailBox, AND
import re, time, datetime, os
from config import email, password
uids = []
yahooSmtpServer = "imap.mail.yahoo.com"
data = {
'today': str(datetime.datetime.today()).split(' ')[0],
'uids': []
}
while True:
while True:
try:
client = MailBox(yahooSmtpServer).login(email, password, 'INBOX')
try:
if not data['today'] == str(datetime.datetime.today()).split(' ')[0]:
data['today'] = str(datetime.datetime.today()).split(' ')[0]
data['uids'] = []
ds = str(datetime.datetime.today()).split(' ')[0].split('-')
msgs = client.fetch(AND(date_gte=datetime.date.today()))
for msg in msgs:
links = []
if str(datetime.datetime.today()).split(' ')[0] == str(msg.date).split(' ')[0] and not msg.uid in data['uids']:
mail = msg.html
if 'order' in mail and not 'cancel' in mail:
for i in re.findall(r'(https?://[^\s]+)', mail):
if 'pick' in i:
link = i.replace('"', "")
link = link.replace('<', '>').split('>')[0]
print(link)
links.append(link)
break
data['uids'].append(msg.uid)
scr_links = pd.DataFrame({'Links': links})
scr_links.to_csv('Links.csv', mode='a', header=False, index=False)
time.sleep(0.5)
except Exception as e:
print(e)
pass
client.logout()
time.sleep(5)
except Exception as e:
print(e)
print('sleeping for 5 sec')
time.sleep(1)
Upvotes: 0
Views: 548
Reputation: 6726
I think this is email server throttle timeout.
Try to see IMAP IDLE.
since 0.51.0 imap_tools has IDLE support:
https://github.com/ikvk/imap_tools/releases/tag/v0.51.0
Upvotes: 1