Mr Alexander
Mr Alexander

Reputation: 349

Scraping HTML elements between ::before and ::after with scrapy and xpath

I am trying to scrape some links from a webpage in python with scrapy and xpath, but the elements I want to scrape are between ::before and ::after so xpath can't see them as they do not exist in the HTML but are dynamically created with javascript. Is there a way to scrape those elements?

::before
<div class="well-white">...</div>
<div class="well-white">...</div>
<div class="well-white">...</div>
::after

This is the actual page http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/amif/calls/amif-2018-ag-inte.html#c,topics=callIdentifier/t/AMIF-2018-AG-INTE/1/1/1/default-group&callStatus/t/Forthcoming/1/1/0/default-group&callStatus/t/Open/1/1/0/default-group&callStatus/t/Closed/1/1/0/default-group&+identifier/desc

Upvotes: 2

Views: 1312

Answers (2)

antonio
antonio

Reputation: 1

Very very easy! you just use the "Absolute XPath" and "Relative XPath" (https://www.guru99.com/xpath-selenium.html) together.By this trick you can pass form ::before (and maybe ::after). For example in your case (I supposed that,: //div[@id='"+FindField+"'] // following :: td[@class='KKKK'] is before your "div".

FindField='your "id" associated to the "div"'
driver.find_element_by_xpath ( "//div[@id='"+FindField+"']  // following :: td[@class='KKKK'] / div")

NOTE:only one "/" must be use. Also you can use only "Absolute XPath" in all addressing (Note:must be use "//" at the first Address.

Upvotes: 0

Granitosaurus
Granitosaurus

Reputation: 21406

I can't replicate your exact document state.
However if you load the page you can see some template language loaded in the same format your example data is: enter image description here

Also if you check XHR network inpector you can see some AJAX requests for json data is being made: enter image description here

So you can download the whole data you are looking for in handy json format over here:

http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json

scrapy shell "http://ec.europa.eu/research/participants/portal/data/call/amif/amif_topics.json"
> import json
> data = json.loads(response.body_as_unicode())
> data['topicData']['Topics'][0]
{'topicId': 1259874, 'ccm2Id': 31081390, 'subCallId': 910867, ...

Upvotes: 1

Related Questions