Reputation: 25
I am trying to build a dataframe using Daily(Point(lat,lon), start-date, end-date), a function of the meteostat library that returns all the daily weather statistics for the location indicated by Point(lat,lon) using latitude and longitude, from the start-date to the end-date.
The issue is that (lat, lon) arguments needs to be float and so indicates only one location. I want to addres several locations and collect the daily metereological data for each of them.
import meteostat
from datetime import datetime
from meteostat import Point, Daily
import matplotlib.pyplot as plt
from meteostat import Stations
import pandas as pd
import numpy
data = pd.read_csv(r'C:\Users\leoac\OneDrive\Desktop\Coding\Python apps\Correlation temp-goals in Serie A\seasons 09-19.csv', ";")
date_not_converted = data['Date']
date_being_converted = datetime.strptime(date_not_converted,'%d,%m,%Y') #1bis non può essere una serie...allora provo a cambiare il tipo di dati
date = date_being_converted.strftime('%Y,%m,%d')
#plot = Daily(Point(data['lat'][15],data['lon'][15]),d1,d2).fetch()
data['temp'] = Daily(Point(data['lat'][1],data['lon'][1]),datetime(date),datetime(date)).fetch() #1 sistemare il formato data
print(data['temp']) #2 trovare un modo per inserire i vettori date e lat/lon nel df
data['temp'].plot(y=['tavg'])
plt.show()
print(data)
Upvotes: 0
Views: 349
Reputation: 2002
Here is a solution inspired by this github issue. It makes parallel requests for the different locations and then merges the results in a pandas dataframe.
from datetime import datetime
from meteostat import Point, Daily
from multiprocessing import cpu_count
from joblib import Parallel, delayed
import pandas as pd
def get_bulk_data(row):
location = Point(row.lat, row.lon)
data = Daily(location, row.Date, row.Date).fetch()
data["latitude"] = row.lat
data["longitude"] = row.lon
return data
if __name__ == "__main__":
df = pd.read_csv("seasons.csv", sep=";")
df["Date"] = pd.to_datetime(df["Date"], format="%d,%m,%Y")
executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing')
tasks = (
delayed(get_bulk_data)(row)
for _, row in df.iterrows()
)
list_of_locations_data = executor(tasks)
data_full = pd.concat(list_of_locations_data)
print(data_full)
It works with the following csv and date formats, you can adapt the code if yours are slightly different:
Date;lat;lon
18,02,1997;50.3;-4.7
12,07,1998;41.3;1.5
Upvotes: 0