Reputation: 3094
I'm trying to scrape the following html code:
<ul class="results-list" id="search-results">
<li>
<h3 class="name">First John</h3>
<div class="details">
<a href="mailto:[email protected]" class="email">email</a>
<span class="phone">999999999</span>
</div>
</li>
<li>
<h3 class="name">Second John</h3>
<div class="details">
<a href="mailto:[email protected]" class="email">email</a>
<span class="phone">999999999</span>
</div>
</li>
</ul>
When I run my spider, I get 2 rows, containing the same information. I have name,email,phone columns and for example in the name column for both I would get: First John,Second John.
My Scrapy code is the following:
people= response.xpath('//ul[@class="results-list"]/li')
for person in people:
item = SpiderItem()
item['Name'] = person.xpath(
'//h3/text()').extract()
item['Email'] = person.xpath(
'//div[@class="details"]/a/@href').extract()
item['Phone'] = person.xpath(
'//div[@class="details"]/span[@class="phone"]/text()').extract()
yield item
However when I run scrapy crawl MySpider -o output.csv
I get the same information in all rows.
Upvotes: 0
Views: 77
Reputation: 18799
you are using absolute path on your xpath expressions, change them to:
for person in people:
item = SpiderItem()
item['Name'] = person.xpath(
'.//h3/text()').extract_first()
item['Email'] = person.xpath(
'.//div[@class="details"]/a/@href').extract_first()
item['Phone'] = person.xpath(
'.//div[@class="details"]/span[@class="phone"]/text()').extract_first()
yield item
Upvotes: 1