Reputation: 77
Need help please!
After crawling a site and returning processing the data through pipelines, I need to send the scraped data via email. I've tried and read everything but can't seem to connect the dots. In Pipelines I've tried the following:
class EmailPipeline(object):
def close_spider(self, spider):
from_email = "[email protected]"
to_email = "[email protected]"
msg = MIMEMultipart()
msg['From'] = from_email
msg['To'] = to_email
msg['Subject'] = 'Scrapper Results'
intro = "Summary stats from Scrapy spider: \n\n"
body = spider.crawler.stats.get_stats()
body = pprint.pformat(body)
body = intro + body
msg.attach(MIMEText(body, 'plain'))
server = smtplib.SMTP("mailserver", 465)
server.startssl()
server.login("user", "password")
text = msg.as_string()
server.sendmail(from_email, to_email, text)
server.quit()
Should I be sending the email from a pipeline or an extension or is it preference? How would I implement it??
Thanks all!
Upvotes: 1
Views: 76
Reputation: 3561
Scrapy provides MailSender
module (which is based on smtplib
):
from scrapy.mail import MailSender
mailer = MailSender()
mailer.send(to=["[email protected]"], subject="Some subject", body="Some body", cc=["[email protected]"])
Upvotes: 2
Reputation: 769
Here is a file you can use and import this send_mail function. You will need to change some things in order to make it work for your situation. You are including it in the correct way through a pipe line.
import smtplib
# For guessing MIME type
import mimetypes
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
# Import the email modules we'll need
import email
def send_mail(filename):
sender = '[email protected]'
reciever = '[email protected]'
marker = "AUNIQUEMARKER"
msg = MIMEMultipart()
msg['Subject'] = 'Subject text here'
msg['From'] = sender
msg['To'] = reciever
# Read a file and encode it into base64 format
fo = open(filename, "rb")
att = MIMEApplication(fo.read(),_subtype="pdf")
msg.attach(att)
fo.close()
try:
smtpObj = smtplib.SMTP(host='smtp.host.com', port=587)
smtpObj.ehlo()
smtpObj.starttls()
smtpObj.login(sender, 'your password')
smtpObj.sendmail(sender, reciever, msg.as_string())
print('SUCCESSFULLY SENT EMAIL')
return
except Exception as e:
print("SEND E-MAIL FAILED WITH EXCEPTION: {}".format(e))
return
Another piece to find the last modified file in your output directory
import os
import glob
download_dir = "/full/path/to/files/"
def get_newest_file():
print("Finding latest pdf file")
file_list = glob.glob('{}*.pdf'.format(download_dir))
latest_file = max(file_list, key=os.path.getctime)
if latest_file:
print("Latest file: {}".format(latest_file))
return latest_file
Upvotes: 0