Matthew Attiyeh
Matthew Attiyeh

Reputation: 5

Exporting my web scraping results to CSV file in Python

I have been searching similar questions but none seem to answer what my problem is. I have also tried searching google and can't find any answers. I am pretty new to programming so please let me know if my code is not the best way to do this.

I am using Selenium to pull some data from a website and I want to put it all in a CSV file. I have it working except my code only puts the last item into the csv file. I think it is putting all the results on cell A1 and just overwriting it every time it gets a new result. I attached my code below but I might be going about it completely wrong.

The first part of the code is just defining variables and running selenium. The find_by_xpath() function is where I am running into trouble. When I run the print(sel_test) line everything seems fine and it prints out all of the results. However when I try to write sel_test to a csv it only gives the last result. Any ideas?

Thanks!

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from datetime import datetime
import csv, time, os, datetime, re

# Current date time in local system

today = datetime.date.today()
day = today.strftime("%d")
month = today.strftime("%m")
year = today.strftime("%y")
year_for_scrape = today.strftime("20" + "%y")
yesterday = today - datetime.timedelta(days=1)
yesterday_day_global = yesterday.strftime("%d")
print("month number is", month)
print("year is", year_for_scrape)

# Call chromedriver and navigate to a page
driver = webdriver.Chrome(chromedriver_location)
driver.get('http://pipeline.wyo.gov/ApdCompCompMenu.cfm')

# Define the x paths from using inspect element in chrome and copy x path
beginning_month = '//*[@id="CalendarCompletiom"]/table/tbody/tr[3]/td[1]/input[1]'
beginning_day = '//*[@id="CalendarCompletiom"]/table/tbody/tr[3]/td[1]/input[2]'
beginning_yr = '//*[@id="CalendarCompletiom"]/table/tbody/tr[3]/td[1]/input[3]'
end_month = '//*[@id="CalendarCompletiom"]/table/tbody/tr[3]/td[3]/input[1]'
end_day ='//*[@id="CalendarCompletiom"]/table/tbody/tr[3]/td[3]/input[2]'
end_yr = '//*[@id="CalendarCompletiom"]/table/tbody/tr[3]/td[3]/input[3]'
go_find = '//*[@id="SubmitClassDate"]'

# Use selenium webdriver
driver.find_element_by_xpath(beginning_month).send_keys("01")
driver.find_element_by_xpath(beginning_day).send_keys("01")
driver.find_element_by_xpath(beginning_yr).send_keys(year_for_scrape)
driver.find_element_by_xpath(end_month).send_keys(month)
driver.find_element_by_xpath(end_day).send_keys(day)
driver.find_element_by_xpath(end_yr).send_keys("2020")
driver.find_element_by_xpath(go_find).click()
print("Done")


#Number of records wanted
num_rec = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]

def find_by_xpath():
    #csv
    csv_file = open('Permit_scrape.csv','w', newline='')
    csv_writer=csv.writer(csv_file)
    csv_writer.writerow(['API'])

    
    for i in range(len(num_rec)):
        sel_test = driver.find_element_by_xpath('/html/body/table[2]/tbody/tr[' +str(i+6) + ']/td[2]').text
        print(sel_test)
            
    #Write to CSV
    csv_writer.writerow([sel_test])
    csv_file.close()
    print('excel_done')

find_by_xpath()

Upvotes: 0

Views: 75

Answers (2)

Alex
Alex

Reputation: 291

In order to append a new line to the existing file, open the file in append mode, by using either 'a' or 'a+' as the access mode.

def find_by_xpath(): csv_file = open('Permit_scrape.csv','a', newline='')

In this case, you should consider that new empty line may be added when new line is add

Upvotes: 0

Spencer Bard
Spencer Bard

Reputation: 1035

I believe your issue is stemming from csv_writer.writerow([sel_test]) not being inside the for loop

Try this:

def find_by_xpath():
    #csv
    csv_file = open('Permit_scrape.csv','w', newline='')
    csv_writer=csv.writer(csv_file)
    csv_writer.writerow(['API'])

    
    for i in range(len(num_rec)):
        sel_test = driver.find_element_by_xpath('/html/body/table[2]/tbody/tr[' +str(i+6) + ']/td[2]').text
        #Write to CSV
        csv_writer.writerow([sel_test])
    csv_file.close()
    print('excel_done')

Upvotes: 2

Related Questions