Starry_Night
Starry_Night

Reputation: 1

Parsing "Further reading" with selenium, python

I need to parse text from Further reading in wikipedia. My code can open "google" by inputing request, for example 'Bill Gates', and then it can find url of wikipedia's page.And now i need to parse text from Further reading, but i do not know how. Here is code:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

URL = "https://www.google.com/"
adress = input()  #input request, example: Bill Gates

def main():
    driver = webdriver.Chrome()
    driver.get(URL)
    element = driver.find_element_by_name("q")
    element.send_keys(adress, Keys.ARROW_DOWN)
    element.send_keys(Keys.ENTER)
    elems = driver.find_elements_by_css_selector(".r [href]")
    link = [elem.get_attribute('href') for elem in elems]
    url = link[0]    #wikipedia's page's link


if __name__ == "__main__":
    main()

And here's HTML code

<h2>
<span class="mw-headline" id="Further_reading">Further reading</span>
</h2>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>
<h3>
<span class="mw-headline" id="Primary_sources">Primary sources</span>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>

url - https://en.wikipedia.org/wiki/Bill_Gates

Upvotes: 0

Views: 32

Answers (1)

Svetlana Levinsohn
Svetlana Levinsohn

Reputation: 1556

This page has Further Reading text between 2 h2 tags. To collect the text, just find ul elements between h2s. This is the code that worked for me:

# Open the page:
driver.get('https://en.wikipedia.org/wiki/Bill_Gates')
# Search for element, get text:
further_read = driver.find_element_by_xpath("//ul[preceding-sibling::h2[./span[@id='Further_reading']] and following-sibling::h2[./span[@id='External_links']]]").text
print(further_read)

I hope this helps, good luck.

Upvotes: 1

Related Questions