Reputation: 77
In need of some troubleshooting for some code that does the following:
1) Scrape links from a webpage
2) Scrape text for the links, from the same page
Had some success in extracting links and writing as a single column:
elements = driver.find_elements_by_xpath("//a[@href]")
with open('csvfile01.csv', "w", newline='') as output:
writer = csv.writer(output)
for element in elements:
writer.writerow([element.get_attribute("href")])
Unfortunately, was stuck when it came to:
1) getting the "text" for the links, and
2) exporting it as a separate column...
3) scraping a specific part of the webpage for links, e.g. in a table ("td") or a div section
The code as it stands now:
from selenium import webdriver
import time
import csv
driver = webdriver.Chrome()
driver.get("https://en.wikipedia.org/wiki/Main_Page")
time.sleep(5)
columns = ['text', 'link']
e1 = driver.find_element_by_css_selector("a")
e2 = driver.find_elements_by_xpath("//a[@href]")
elements = zip(e1,e2)
time.sleep(5)
with open('csvfile01.csv', "w", newline='') as output:
writer = csv.writer(output)
for element in elements:
writer.writerow(columns)
writer.writerows(elements)
driver.quit()
Any suggestions would be much appreciated. Thanks!
Upvotes: 1
Views: 125
Reputation: 174
As far as the getting the text goes , you can do .text , also your css selector dosent seem right considering it is only “a”, to get an xpath/css selector just inspect the element and right click it then click copy then you get a list of things to copy, I do not use selenium much but when I did use it I noticed in the xpath that only 1 number would change (like if it’s a table of proxies) so I just defined a counter and incremented it in a loop
Upvotes: 1