Ashley O
Ashley O

Reputation: 1190

Appending Scraped Data to Dataframe - Python, Selenium

I'm learning webscraping and working on Eat24 (Yelp's website). I'm able to scrape basic data from Yelp, but unable to do something pretty simple: append that data to a dataframe. Here is my code, I've notated it so it should be simple to follow along.

from selenium import webdriver
import time
import pandas as pd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()

#go to eat24, type in zip code 10007, choose pickup and click search

driver.get("https://new-york.eat24hours.com/restaurants/index.php")
search_area = driver.find_element_by_name("address_auto_complete")
search_area.send_keys("10007")
pickup_element = driver.find_element_by_xpath("//[@id='search_form']/div/table/tbody/tr/td[2]")
pickup_element.click()
search_button = driver.find_element_by_xpath("//*[@id='search_form']/div/table/tbody/tr/td[3]/button")
search_button.click()


#scroll up and down on page to load more of 'infinity' list

for i in range(0,3):
    driver.execute_script("window.scrollTo(0, 
document.body.scrollHeight);")
    driver.execute_script("window.scrollTo(0,0);")
    time.sleep(1)

#find menu urls

menu_urls = [page.get_attribute('href') for page in 
driver.find_elements_by_xpath('//*[@title="View Menu"]')]

df = pd.DataFrame(columns=['name', 'menuitems'])

#collect menu items/prices/name from each URL
for url in menu_urls:
    driver.get(url)
    menu_items = driver.find_elements_by_class_name("cpa")
    menu_items = [x.text for x in menu_items]
    menu_prices = driver.find_elements_by_class_name('item_price')
    menu_prices = [x.text for x in menu_prices]
    name = driver.find_element_by_id('restaurant_name')
    menuitems = dict(zip(menu_items, menu_prices))
    df['name'] = name
    df['menuitems'] = menuitems

df.to_csv('test.csv', index=False)

The problem is at the end. It isn't adding menuitems + name into successive rows in the dataframe. I have tried using .loc and other functions but it got messy so I removed my attempts. Any help would be appreciated!!

Edit: The error I get is "ValueError: Length of values does not match length of index" when the for loop attempts to add a second set of menuitems/restaurant name to the dataframe

Upvotes: 0

Views: 1723

Answers (1)

Ashley O
Ashley O

Reputation: 1190

I figured out a simple solution, not sure why I didn't think of it before. I added a "row" count that goes up by 1 on each iteration, and used .loc to place data in the "row"th row

row = 0
for url in menu_urls:
    row +=1
    driver.get(url)
    menu_items = driver.find_elements_by_class_name("cpa")
    menu_items = [x.text for x in menu_items]
    menu_prices = driver.find_elements_by_class_name('item_price')
    menu_prices = [x.text for x in menu_prices]
    name = driver.find_element_by_id('restaurant_name').text
    menuitems = [dict(zip(menu_items, menu_prices))]
    df.loc[row, 'name'] = name
    df.loc[row, 'menuitems'] = menuitems
    print df

Upvotes: 1

Related Questions