Reputation: 349
I am trying to scrape some links from a webpage in python with scrapy and xpath, but the elements I want to scrape are between ::before and ::after so xpath can't see them as they do not exist in the HTML but are dynamically created with javascript. Is there a way to scrape those elements?
::before
<div class="well-white">...</div>
<div class="well-white">...</div>
<div class="well-white">...</div>
::after
Upvotes: 2
Views: 1312
Reputation: 1
Very very easy!
you just use the "Absolute XPath" and "Relative XPath" (https://www.guru99.com/xpath-selenium.html) together.By this trick you can pass form ::before (and maybe ::after). For example in your case (I supposed that,:
//div[@id='"+FindField+"'] // following :: td[@class='KKKK']
is before your "div".
FindField='your "id" associated to the "div"'
driver.find_element_by_xpath ( "//div[@id='"+FindField+"'] // following :: td[@class='KKKK'] / div")
NOTE:only one "/" must be use. Also you can use only "Absolute XPath" in all addressing (Note:must be use "//" at the first Address.
Upvotes: 0
Reputation: 21406
I can't replicate your exact document state.
However if you load the page you can see some template language loaded in the same format your example data is:
Also if you check XHR network inpector you can see some AJAX requests for json data is being made:
So you can download the whole data you are looking for in handy json format over here:
http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json
scrapy shell "http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json"
> import json
> data = json.loads(response.body_as_unicode())
> data['topicData']['Topics'][0]
{'topicId': 1259874, 'ccm2Id': 31081390, 'subCallId': 910867, ...
Upvotes: 1