Deba
Deba

Reputation: 441

mail ID can't be scraped

I am trying to scrape the mail ID with Scrapy, Python and RegEx from this page: https://allevents.in/bangalore/project-based-summer-training-program/1851553244864163 .

For that purpose, I wrote the following commands, each of which returned an empty list:

response.xpath('//a/*[@href = "#"]/text()').extract()

response.xpath('//a/@onclick').extract()

response.xpath('//a/@onclick/text()').extract()

response.xpath('//span/*[@class = ""]/a/text()').extract()

response.xpath('//a/@onclick/text()').extract()

Apart from these, I had a plan to scrape the email ID from the description using RegEx. For that I wrote command to scrape the description which scraped everything except the email Id at the end of the description:

response.xpath('//*[@property = "schema:description"]/text()').extract()

The output of the above command is:

[u'\n\t\t\t\t\t\t\t     "Your Future is created by what you do today Let\'s shape it With Summer Training Program \u2026\u2026\u2026 ."', u'\n', u'\nWith ever changing technologies & methodologies, the competition today is much greater than ever before. The industrial scenario needs constant technical enhancements to cater to the rapid demands.', u'\nHT India Labs is presenting Summer Training Program to acquire and clear your concepts about your respective fields. ', u'\nEnroll on ', u' and avail Early bird Discounts.', u'\n', u'\nFor Registration or Enquiry call 9911330807, 7065657373 or write us at ', u'\t\t\t\t\t\t']

Upvotes: 0

Views: 35

Answers (1)

SIM
SIM

Reputation: 22440

I don't have much knowledge on onclick event attribute. I suppose, when it is set to return false then the request usually skips that portion. However, if you try the way I showed below, you may get the result very close to what you want.

import requests
from scrapy import Selector

res = requests.get("https://allevents.in/bangalore/project-based-summer-training-program/1851553244864163")
sel = Selector(res)
for items in sel.css("div[property='schema:description']"):
    emailid = items.css("span::text").extract_first()
    print(emailid)

Output:

htindialabsworkshops | gmail ! com

Upvotes: 1

Related Questions