DanOpi
DanOpi

Reputation: 143

How to Scrape Empty String for those with no Email

I'm trying to scrape a webpage that has people and their info on it (Phone, Name, Position, Email, etc). Some of the people are missing either a phone number or email and I'm having trouble with this because I combine lists and if it doesn't scrape a string the indexes will be different.

This is how I'm scraping email:

response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]/a/@title').extract()

I'm getting emails from people with this HTML code:

<div class="contact-text contact-email ctaType-email">
    <a itemprop="email" href="mailto:[email protected]" alt=
  "[email protected]" title="[email protected]">[email protected]</a>                                            
</div>

However it's totally skipping over people with this HTML code and messing up my list indexes.

<div class="contact-text contact-email ctaType-email">

</div>

Is there anyway to make it scrape the empty email address field so I'm able to easily combine fields or add a string into those empty fields?

Thanks much!

Upvotes: 0

Views: 74

Answers (1)

Granitosaurus
Granitosaurus

Reputation: 21436

You can simply split your extraction into two parts:

  1. Extract all people nodes
  2. For every person node extract email or empty

For example:

people = response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]')
emails = [p.xpath('a/@title').extract() or '' for p in people]

Usual approach to avoid issues like these is to extract item nodes and then iterate through them:

people = response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]')
for person in people:
    item = dict()
    item['email'] = person.xpath('a/@title').extract()
    item['something_else'] = person.xpath('...')
    # ...
    yield item

Upvotes: 1

Related Questions