Connor Cantrell
Connor Cantrell

Reputation: 119

Web Scraping : Updating/ adding to a dataframe using pandas

I am creating a web scraping program using python, pandas, and BeautifulSoup. I want it to request wind information from a weather station every 10 minutes. This data is to be stored in an array containing 72 indexes (24 hours).

Up to this point I have managed to create a messy dataframe with current conditions. I have 3 questions, the third question might exceed my abilities at this time.

1: How to exclude '/n' from my data while parsing?

2: How to update every 10 minutes and add to array

3: How to push/pop data from array, displaying most recent data at front of array. (I have read about push and pop, and this may be something I can research in the future.)

This is the first bit of code I have personally written, so please excuse me. I have inserted my code below, and here's a link of an image displaying my output. https://i.sstatic.net/XIN3Z.png

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.wunderground.com/weather/us/ca/montara'
page = requests.get(url)
page.text
soup = BeautifulSoup(page.text, 'html.parser')

#read wind speed
montara_wSpeed = []
montara_wSpeed_elem = soup.find_all(class_='wind-speed')

for item in montara_wSpeed_elem:
    montara_wSpeed.append(item.text)

#read wind direction
montara_wCompass = []
montara_wCompass_elem = soup.find_all(class_='wind-compass')

for item in montara_wCompass_elem:
    montara_wCompass.append(item.text)

#read wind station
montara_station = []
montara_station_elem = soup.find_all(class_='station-nav')

for item in montara_station_elem:
    montara_station.append(item.text)

#create dataframe
montara_array = []

for station, windCompass, windSpeed in zip(montara_station, montara_wCompass, montara_wSpeed):
    montara_array.append({'Station': station, 'Wind Direction': windCompass, 'Wind Speed': windSpeed})

df = pd.DataFrame(montara_array)
df

Upvotes: 0

Views: 1385

Answers (1)

Upasana Mittal
Upasana Mittal

Reputation: 2680

Here I tried this using replace, time.sleep and extend and I have just modified your code only:

import requests
from bs4 import BeautifulSoup

url = 'https://www.wunderground.com/weather/us/ca/montara'

def scrap_weather(url):
    page = requests.get(url)
    page.text
    soup = BeautifulSoup(page.text, 'html.parser')

    #read wind speed
    montara_wSpeed = []
    montara_wSpeed_elem = soup.find_all(class_='wind-speed')

    for item in montara_wSpeed_elem:
        montara_wSpeed.append(item.text.replace("\n",""))

    # read wind direction
    montara_wCompass = []
    montara_wCompass_elem = soup.find_all(class_='wind-compass')

    for item in montara_wCompass_elem:
        montara_wCompass.append(item.text.replace("\n",""))

    #read wind station
    montara_station = []
    montara_station_elem = soup.find_all(class_='station-nav')

    for item in montara_station_elem:
        montara_station.append(item.text.replace("\n"," "))

    #create dataframe
    montara_array = []

    for station, windCompass, windSpeed in zip(montara_station, montara_wCompass, montara_wSpeed):
        montara_array.append({'Station': station, 'Wind Direction': windCompass, 'Wind Speed': windSpeed})

    return montara_array

n_times = 3
#df_new = pd.DataFrame(columns = ["Station", "Wind Direction", "Wind Speed"])
for i in range(n_times):
    montara_array = scrap_weather(url)
    if i == 0:
        data = montara_array
    else:
        montara_array.extend(data)
    print(montara_array)
    time.sleep(600)
data = pd.DataFrame(montara_array)

Upvotes: 1

Related Questions