Reputation: 26668
I am working on scrapy , i am scraping a site and using xpath
to scrape items.
But some of the div
contains javascript
, so when i used xpath until the div id
that contains javascript code is returning an empty list,and without including that div element(which contains javascript) can able to fetch HTML data
HTML code
<div class="subContent2">
<div id="contentDetails">
<div class="eventDetails">
<h2>
<a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
</h2>
</div>
</div>
</div>
Spider Code
class ExampleSpider(BaseSpider):
name = "example"
domain_name = "www.example.com"
start_urls = ["http://www.example.com/jkl/index.php"]
def parse(self, response):
hxs = HtmlXPathSelector(response)
required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]')
So how can i get text(Some data)
from the anchor tag
inside the h2 element
as mentioned above, is there any alternate way for fetching data from the elements that contains javascript in scrapy
Upvotes: 2
Views: 2821
Reputation: 59604
<div class="subContent2">
<div id="contentDetails">
<div class="eventDetails">
<h2>
<a href="javascript:;" onclick="jdevents.getEvent(117032)">Some data</a>
</h2>
</div>
</div>
</div>
The problem is not the javascript code in this case to get 'Some data' string.
You need either to get the subnode:
required_data = hxs.select('//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"]/h2/a/text()')
or use string
function:
required_data = hxs.select('string(//div[@class="subContent2"]/div[@id="contentDetails"]/div[@class="eventDetails"])')
Upvotes: 2