Reputation: 53
I was using scrapy to get some data about books on amazon.com. I just want the name, author and prices of the book. I wanna do this by category, for example computer science books.
consider the snippet of code(some amazon page):
<div class="a-row">
::before
<div class="a-column a-span7">
<div class="a-row a-spacing-none">...</div>
<div class="a-row a-spacing-none">...</div>
<hr class="a-divider-normal s-result-divier">
<div class="a-row a-spacing-none">...</div>
<div class="a-row a-spacing-none">...</div>
<div class="a-row a-spacing-none">...</div>
</div>
<div class="a-column a-span5 a-span-last"></div>
::after
</div>
So, I tried to get the div elements inside of div[@class="a-column a-span7"]. But, just the first two div elements are returned. The command I used was:
>>> books = response.selector.xpath ('.//div[@class="a-fixed-left-grid-col a-col-right"]')
>>> abook = books[0].xpath('.//div[@class="a-row"]')
>>> prices = abook.xpath ('.//div[@class="a-column a-span7"]')
>>> len (prices.xpath('div'))
2
The code above does the following:
I've tried different ways to get div elements after the tag < hr >, but It seems the scrapy stop on tag < hr >. I've tried used the following code also and the result just shows two elements:
>>> abook.xpath ('div')
[<Selector xpath='div' data=u'<div class="a-column a-span7"><div class'>, <Selector xpath='div' data=u'<div class="a-column a-span5 a-span-last'>]
I spent some time and I couldn't solve this problem. I think it's very simply.
In this link: stackref. There are some explanations about use tag < br > and < hr >, but It's not clear to me.
Upvotes: 4
Views: 305
Reputation: 3396
The problem that you're facing can be resolved by attaching a user agent
with your request. Try something like this and check your results:
scrapy shell "http://www.amazon.com.br/s/ref=lp_12008582011_nr_n_2?fst=as%3Aoff&rh=n%3A6740748011%2Cn%3A%218169561011%2Cn%3A%218169562011%2Cn%3A12008582011%2Cn%3A12008596011&bbn=12008582011&ie=UTF8&qid=1448202280&rnid=12008582011" -s USER_AGENT='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36'
Upvotes: 2