Stefanpt
Stefanpt

Reputation: 77

Using Extension to send email with scraped data

Need help please!

After crawling a site and returning processing the data through pipelines, I need to send the scraped data via email. I've tried and read everything but can't seem to connect the dots. In Pipelines I've tried the following:

class EmailPipeline(object):
    def close_spider(self, spider):
        from_email = "[email protected]"
        to_email = "[email protected]"

        msg = MIMEMultipart()
        msg['From'] = from_email
        msg['To'] = to_email
        msg['Subject'] = 'Scrapper Results'

        intro = "Summary stats from Scrapy spider: \n\n"

        body = spider.crawler.stats.get_stats()
        body = pprint.pformat(body)
        body = intro + body
        msg.attach(MIMEText(body, 'plain'))

        server = smtplib.SMTP("mailserver", 465)
        server.startssl()
        server.login("user", "password")
        text = msg.as_string()
        server.sendmail(from_email, to_email, text)
        server.quit()

Should I be sending the email from a pipeline or an extension or is it preference? How would I implement it??

Thanks all!

Upvotes: 1

Views: 76

Answers (2)

Georgiy
Georgiy

Reputation: 3561

Scrapy provides MailSender module (which is based on smtplib):

from scrapy.mail import MailSender
mailer = MailSender()
mailer.send(to=["[email protected]"], subject="Some subject", body="Some body", cc=["[email protected]"])

Upvotes: 2

eusid
eusid

Reputation: 769

Here is a file you can use and import this send_mail function. You will need to change some things in order to make it work for your situation. You are including it in the correct way through a pipe line.

import smtplib

# For guessing MIME type
import mimetypes
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.application import MIMEApplication
# Import the email modules we'll need
import email

def send_mail(filename):
    sender = '[email protected]'
    reciever = '[email protected]'
    marker = "AUNIQUEMARKER"
    msg = MIMEMultipart()
    msg['Subject'] = 'Subject text here'
    msg['From'] = sender
    msg['To'] = reciever
    # Read a file and encode it into base64 format
    fo = open(filename, "rb")
    att = MIMEApplication(fo.read(),_subtype="pdf")
    msg.attach(att)
    fo.close()
    try:
        smtpObj = smtplib.SMTP(host='smtp.host.com', port=587)
        smtpObj.ehlo()
        smtpObj.starttls()
        smtpObj.login(sender, 'your password')
        smtpObj.sendmail(sender, reciever, msg.as_string())
        print('SUCCESSFULLY SENT EMAIL')
        return
    except Exception as e:
        print("SEND E-MAIL FAILED WITH EXCEPTION: {}".format(e))
        return

Another piece to find the last modified file in your output directory

import os
import glob

download_dir = "/full/path/to/files/"

def get_newest_file():
    print("Finding latest pdf file")
    file_list = glob.glob('{}*.pdf'.format(download_dir))
    latest_file = max(file_list, key=os.path.getctime)
    if latest_file:
        print("Latest file: {}".format(latest_file))
        return latest_file

Upvotes: 0

Related Questions