How to Scrape Empty String for those with no Email

Question

I'm trying to scrape a webpage that has people and their info on it (Phone, Name, Position, Email, etc). Some of the people are missing either a phone number or email and I'm having trouble with this because I combine lists and if it doesn't scrape a string the indexes will be different.

This is how I'm scraping email:

response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]/a/@title').extract()

I'm getting emails from people with this HTML code:


    test@gmail.com

However it's totally skipping over people with this HTML code and messing up my list indexes.

Is there anyway to make it scrape the empty email address field so I'm able to easily combine fields or add a string into those empty fields?

Thanks much!

Granitosaurus · Accepted Answer

You can simply split your extraction into two parts:

Extract all people nodes
For every person node extract email or empty

For example:

people = response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]')
emails = [p.xpath('a/@title').extract() or '' for p in people]

Usual approach to avoid issues like these is to extract item nodes and then iterate through them:

people = response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]')
for person in people:
    item = dict()
    item['email'] = person.xpath('a/@title').extract()
    item['something_else'] = person.xpath('...')
    # ...
    yield item

How to Scrape Empty String for those with no Email

Answers (1)

Related Questions