Reputation: 143
I'm trying to scrape a webpage that has people and their info on it (Phone, Name, Position, Email, etc). Some of the people are missing either a phone number or email and I'm having trouble with this because I combine lists and if it doesn't scrape a string the indexes will be different.
This is how I'm scraping email:
response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]/a/@title').extract()
I'm getting emails from people with this HTML code:
<div class="contact-text contact-email ctaType-email">
<a itemprop="email" href="mailto:[email protected]" alt=
"[email protected]" title="[email protected]">[email protected]</a>
</div>
However it's totally skipping over people with this HTML code and messing up my list indexes.
<div class="contact-text contact-email ctaType-email">
</div>
Is there anyway to make it scrape the empty email address field so I'm able to easily combine fields or add a string into those empty fields?
Thanks much!
Upvotes: 0
Views: 74
Reputation: 21436
You can simply split your extraction into two parts:
For example:
people = response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]')
emails = [p.xpath('a/@title').extract() or '' for p in people]
Usual approach to avoid issues like these is to extract item nodes and then iterate through them:
people = response.xpath('//ul//div[@class="contact-text contact-email ctaType-email"]')
for person in people:
item = dict()
item['email'] = person.xpath('a/@title').extract()
item['something_else'] = person.xpath('...')
# ...
yield item
Upvotes: 1