Reputation: 1
I need to parse text from Further reading in wikipedia. My code can open "google" by inputing request, for example 'Bill Gates', and then it can find url of wikipedia's page.And now i need to parse text from Further reading, but i do not know how. Here is code:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
URL = "https://www.google.com/"
adress = input() #input request, example: Bill Gates
def main():
driver = webdriver.Chrome()
driver.get(URL)
element = driver.find_element_by_name("q")
element.send_keys(adress, Keys.ARROW_DOWN)
element.send_keys(Keys.ENTER)
elems = driver.find_elements_by_css_selector(".r [href]")
link = [elem.get_attribute('href') for elem in elems]
url = link[0] #wikipedia's page's link
if __name__ == "__main__":
main()
And here's HTML code
<h2>
<span class="mw-headline" id="Further_reading">Further reading</span>
</h2>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>
<h3>
<span class="mw-headline" id="Primary_sources">Primary sources</span>
<ul>
<li>...</li>
<li>...</li>
<li>...</li>
...
</ul>
url - https://en.wikipedia.org/wiki/Bill_Gates
Upvotes: 0
Views: 32
Reputation: 1556
This page has Further Reading text between 2 h2
tags. To collect the text, just find ul
elements between h2
s. This is the code that worked for me:
# Open the page:
driver.get('https://en.wikipedia.org/wiki/Bill_Gates')
# Search for element, get text:
further_read = driver.find_element_by_xpath("//ul[preceding-sibling::h2[./span[@id='Further_reading']] and following-sibling::h2[./span[@id='External_links']]]").text
print(further_read)
I hope this helps, good luck.
Upvotes: 1