Reputation: 107
I am having trouble taking down all of the Xpath hits. I am telling it to take all of the elements from 0 to j (j=20) that is the length of the container for which there is an xpath hit for //[@id='tabs-1']/div[3]/table/tbody/tr[2]/td and for //[@id='tabs-1']/div[3]/table/tbody/tr[1]/td[3]. However, when it cycles through j it only seems to write the very last one into the csv file. Is this a problem with the way the csvWriter is coded? I want to take all of the hits and put them into separate rows in a csv file with each row having a hit for both path queries (spread across 2 columns) with each j having a separate row.
Also, how would I code it so that the csv adds to already existing rows when it cycles to the next page (for i in range (0, num_pages)) and repeats the process? Thanks for your help!
import sys
import csv
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
# default path to file to store data
path_to_file = "/Users/D/Desktop/reviews.csv"
# default number of scraped pages
num_page = 3
# default tripadvisor website of hotel or things to do (attraction/monument)
url = "https://www.tripadvisor.com/Attraction_Review-g187791-d192285-Reviews-Colosseum-Rome_Lazio.html"
# if you pass the inputs in the command line
if (len(sys.argv) == 4):
path_to_file = sys.argv[1]
num_page = int(sys.argv[2])
url = sys.argv[3]
# import the webdriver
driver = webdriver.Safari()
driver.get(url)
# open the file to save the review
csvFile = open(path_to_file, 'a', encoding="utf-8")
csvWriter = csv.writer(csvFile)
# change the value inside the range to save more or less reviews
for i in range(0, num_page):
name = []
start=[]
# expand the review
time.sleep(2)
container = driver.find_elements_by_xpath("//*[@id='tabs-1']/div[3]/table/tbody")
for j in range(len(container)):
name = container[j].find_element_by_xpath(".//tr[2]/td").text
start = container[j].find_element_by_xpath(".//tr[1]/td[3]").text
# name of csv file
filename = path_to_file
# writing to csv file
with open(filename, 'w') as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the data rows
csvwriter.writerow([name, start])
driver.find_element_by_xpath("//*[@id='tabs-1']/div[2]/a[@accesskey='n']").click()
driver.quit()
Upvotes: 0
Views: 88
Reputation: 27063
in each iteration you are overwriting the old contents of the file. that is why only the last iteration survives.
this line
with open(filename, 'w') as csvfile:
opens the file and truncates (removes) the content
to append use a
instead of w
.
see https://docs.python.org/3/library/functions.html#open
or for better performance open the file once outside of the loop.
with open(filename, 'w') as csvfile:
csvwriter = csv.writer(csvfile)
for j in range(len(container)):
...
csvwriter.writerow([name, start])
this might not matter much because selenium is likely far slower than multiple opens. but it is always nice for your system if you use open sparingly.
Upvotes: 1