Reputation: 119
I am creating a web scraping program using python, pandas, and BeautifulSoup. I want it to request wind information from a weather station every 10 minutes. This data is to be stored in an array containing 72 indexes (24 hours).
Up to this point I have managed to create a messy dataframe with current conditions. I have 3 questions, the third question might exceed my abilities at this time.
1: How to exclude '/n' from my data while parsing?
2: How to update every 10 minutes and add to array
3: How to push/pop data from array, displaying most recent data at front of array. (I have read about push and pop, and this may be something I can research in the future.)
This is the first bit of code I have personally written, so please excuse me. I have inserted my code below, and here's a link of an image displaying my output.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.wunderground.com/weather/us/ca/montara'
page = requests.get(url)
page.text
soup = BeautifulSoup(page.text, 'html.parser')
#read wind speed
montara_wSpeed = []
montara_wSpeed_elem = soup.find_all(class_='wind-speed')
for item in montara_wSpeed_elem:
montara_wSpeed.append(item.text)
#read wind direction
montara_wCompass = []
montara_wCompass_elem = soup.find_all(class_='wind-compass')
for item in montara_wCompass_elem:
montara_wCompass.append(item.text)
#read wind station
montara_station = []
montara_station_elem = soup.find_all(class_='station-nav')
for item in montara_station_elem:
montara_station.append(item.text)
#create dataframe
montara_array = []
for station, windCompass, windSpeed in zip(montara_station, montara_wCompass, montara_wSpeed):
montara_array.append({'Station': station, 'Wind Direction': windCompass, 'Wind Speed': windSpeed})
df = pd.DataFrame(montara_array)
df
Upvotes: 0
Views: 1385
Reputation: 2680
Here I tried this using replace
, time.sleep
and extend
and I have just modified your code only:
import requests
from bs4 import BeautifulSoup
url = 'https://www.wunderground.com/weather/us/ca/montara'
def scrap_weather(url):
page = requests.get(url)
page.text
soup = BeautifulSoup(page.text, 'html.parser')
#read wind speed
montara_wSpeed = []
montara_wSpeed_elem = soup.find_all(class_='wind-speed')
for item in montara_wSpeed_elem:
montara_wSpeed.append(item.text.replace("\n",""))
# read wind direction
montara_wCompass = []
montara_wCompass_elem = soup.find_all(class_='wind-compass')
for item in montara_wCompass_elem:
montara_wCompass.append(item.text.replace("\n",""))
#read wind station
montara_station = []
montara_station_elem = soup.find_all(class_='station-nav')
for item in montara_station_elem:
montara_station.append(item.text.replace("\n"," "))
#create dataframe
montara_array = []
for station, windCompass, windSpeed in zip(montara_station, montara_wCompass, montara_wSpeed):
montara_array.append({'Station': station, 'Wind Direction': windCompass, 'Wind Speed': windSpeed})
return montara_array
n_times = 3
#df_new = pd.DataFrame(columns = ["Station", "Wind Direction", "Wind Speed"])
for i in range(n_times):
montara_array = scrap_weather(url)
if i == 0:
data = montara_array
else:
montara_array.extend(data)
print(montara_array)
time.sleep(600)
data = pd.DataFrame(montara_array)
Upvotes: 1