How to scrape hidden text from a web page?

Question

I am trying to scrape some text from a web page. On my webpage there is a list of words being shown. Some of them are visible some others become visible when I click on "+ More". Once clicked, the list of words is always the same (same order same words). However, some of them are in bold some are in deleted. So basically each item of the database has some features. What I would like to do: for each item tell me which features are available and which not. My problem is to overcome the "+ More" button.

My script works fine only for those words which are shown and not for those which are hidden by "+ More". What I am trying to do is to collect all the words that follow under the node "del". I initially thought that through lxml, the web page would have been loaded as it appears in chrome inspect element and I wrote my code accordingly:

from lxml import html

tree = html.fromstring(br.open(current_url).get_data())

mydata={}

if len(tree.xpath('//del[text()='some text']')) > 0:
    mydata['some text'] = 'text is deleted from the web page!'
else:
    mydata['some text'] = 'text is not deleted'

Every time I ran this code what I can collect is actually part of data being shown on the web page, but not the complete list of words that would have been shown after clicking "+ More".

I had tried selenium, but as far as I understand it is not meant for parsing but rather to interact with the web page. However if I ran this:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://www.mywebpage.co.uk')

a = driver.find_element_by_xpath('//del[text()="some text"]')

I either get the element or an error. I would like to get an empty list so I could do:

mydata = {}

if len(driver.find_element_by_xpath('//del[text()="some text"]')) > 0:
    mydata['some text'] = 'text is deleted from the web page!'
else:
    mydata['some text'] = 'text is not deleted'

or find another way to get these "hidden" elements captured by the script.

My question is has anyone had this type of problem? How did them sorted it out?

How to scrape hidden text from a web page?

Answers (1)

Related Questions