Reputation: 1109
I have Scrapy spider that runs well. What I need to do is make an API call from inside parse method and use results from response in the same method with the same items. How do I do this? The only simple thing comes to mind is use python requests library but I am not sure if this works in scrapy and moreover at scrapinghub. Is there any built in solution? Here is an example.
def agency(self, response):
# inspect_response(response, self)
agents = response.xpath('//a[contains(@class,"agency-carousel__item")]')
Agencie_Name = response.xpath('//h1[@class = "agency-header__name"]/text()').extract_first()
Business_Adress = response.xpath('//div[@class = "agency-header__address"]//text()').extract()
Phone = response.xpath('//span[@class = "modal-item__text"]/text()').extract_first()
Website = response.xpath('//span[@class = "modal-item__text"][contains(text(),"Website")]/../@href').extract_first()
if Website:
pass
# 1 send request to hunter io and get pattern. Apply to entire team. Pass as meta
# do smth with this pattern in here using info from this page.
So here i normaly extract all info from scrapy response, and if Website variable is populated I need to send api call to hunter io to get email pattern for this domain and use it to generate emails in the same method. Hopes that makes sence.
Upvotes: 2
Views: 919
Reputation: 7297
As for vanilla scrapy on your own PC / server, there is no problem accessing third party libraries inside a scraper. You can just do whatever you want, so something like this is no problem at all (which would fetch a mail address from an API using requests
and then send out a mail using smtplib
).
import requests
import smtplib
from email.mime.text import MIMEText
[...]
if Website:
r = requests.get('https://example.com/mail_for_site?url=%s' % Website, auth=('user', 'pass'))
mail = r.json()['Mail']
msg = MIMEText('This will be the perfect job offer for you. ......')
msg['Subject'] = 'Perfect job for you!'
msg['From'] = '[email protected]'
msg['To'] = mail
s = smtplib.SMTP('example.com')
s.sendmail('[email protected]', [mail], msg.as_string())
However, as for scrapinghub I do not know. For this, I can just give you a developer's point of view, because I also develop a managed scraping platform.
I assume that sending a HTTP(S) request using requests
would not be any problem at all. They do not gain security by blocking it, because HTTP(S) traffic is allowed for scrapy anyway. So if somebody would want to do harmful attacks with requests
through HTTP(S), they could just call the same requests with scrapy.
However, SMTP might be another point, you'd have to try. It's possible that they do not allow SMTP traffic from their servers, because it is not required for scraping tasks and can be abused for sending spam. However, since there are legitimate uses for sending mails during a scraping process (e.g. errors), it might as well be possible that SMTP is perfectly fine on scrapinghub, too (and they employ rate limiting or something else against spam).
Upvotes: 1