user7875084
user7875084

Reputation: 107

csvWriter using Python Selenium is not iterating through 0 to j - it is only taking the last hit and not each hit

I am having trouble taking down all of the Xpath hits. I am telling it to take all of the elements from 0 to j (j=20) that is the length of the container for which there is an xpath hit for //[@id='tabs-1']/div[3]/table/tbody/tr[2]/td and for //[@id='tabs-1']/div[3]/table/tbody/tr[1]/td[3]. However, when it cycles through j it only seems to write the very last one into the csv file. Is this a problem with the way the csvWriter is coded? I want to take all of the hits and put them into separate rows in a csv file with each row having a hit for both path queries (spread across 2 columns) with each j having a separate row.

Also, how would I code it so that the csv adds to already existing rows when it cycles to the next page (for i in range (0, num_pages)) and repeats the process? Thanks for your help!

import sys
import csv
from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC 


 
# default path to file to store data
path_to_file = "/Users/D/Desktop/reviews.csv"

# default number of scraped pages
num_page = 3

# default tripadvisor website of hotel or things to do (attraction/monument) 
url = "https://www.tripadvisor.com/Attraction_Review-g187791-d192285-Reviews-Colosseum-Rome_Lazio.html"

# if you pass the inputs in the command line
if (len(sys.argv) == 4):
    path_to_file = sys.argv[1]
    num_page = int(sys.argv[2])
    url = sys.argv[3]

# import the webdriver
driver = webdriver.Safari()
driver.get(url)

# open the file to save the review
csvFile = open(path_to_file, 'a', encoding="utf-8")
csvWriter = csv.writer(csvFile)

# change the value inside the range to save more or less reviews

for i in range(0, num_page):
    name = []
    start=[]
    # expand the review
    time.sleep(2)
    container = driver.find_elements_by_xpath("//*[@id='tabs-1']/div[3]/table/tbody")
    
    for j in range(len(container)):
        name = container[j].find_element_by_xpath(".//tr[2]/td").text
        start = container[j].find_element_by_xpath(".//tr[1]/td[3]").text
        
# name of csv file  
        filename = path_to_file
    
# writing to csv file  
        with open(filename, 'w') as csvfile:  
    # creating a csv writer object  
            csvwriter = csv.writer(csvfile)   
    # writing the data rows  
            csvwriter.writerow([name, start])

        driver.find_element_by_xpath("//*[@id='tabs-1']/div[2]/a[@accesskey='n']").click()
     

driver.quit()

Upvotes: 0

Views: 88

Answers (1)

Lesmana
Lesmana

Reputation: 27063

in each iteration you are overwriting the old contents of the file. that is why only the last iteration survives.

this line

with open(filename, 'w') as csvfile:

opens the file and truncates (removes) the content

to append use a instead of w.

see https://docs.python.org/3/library/functions.html#open

or for better performance open the file once outside of the loop.

with open(filename, 'w') as csvfile:  
    csvwriter = csv.writer(csvfile)   
    for j in range(len(container)):
        ...
        csvwriter.writerow([name, start])

this might not matter much because selenium is likely far slower than multiple opens. but it is always nice for your system if you use open sparingly.

Upvotes: 1

Related Questions