Reputation:
I have a year's worth of data and this is just a sample of how it is formatted: sample of data
for each lat and lon pair I need to take statistics over a time period. How do I do this???
For example: there are 1000s of temperature values specifically at lat = 25.313 and lon = -108.813. This data is mapped on a grid of the US and at each particular lat and lon I want to take statistic on based on time for temperature. I have not done something like this before and I need to figure out a method to do this.
I have not utilized data before where it was represented like this I am going to search the web to see how to do this. More or less this is for some sort of advice since my inital results seem to be lacking.
Thanks!
Edit I noticed a way to get each data point at a particular time step, hourly, so I horizontally reorganized the variable in particularly I wanted. All I need to do now is to run my averages across each row.
import os
import pandas as pd
finaldf = pd.DataFrame()
directory = "C:/Users/truet/OneDrive/Desktop/test" #change last directory
for filename in os.listdir(directory):
fullpath = os.path.join(directory, filename)
if os.path.isfile(fullpath) and fullpath.endswith(".csv"):
dfchild = pd.read_csv(fullpath,usecols=[4])
#define columns you want to explort
dfmaster = dfchild
finaldf = pd.concat([finaldf, dfmaster],axis = 1)
print(dfmaster.reset_index(drop=True))
finaldf.to_csv("C:/Users/truet/OneDrive/Desktop/test.csv", index=False)
Upvotes: 0
Views: 741
Reputation: 2714
That's a pretty broad question since you could be doing anything with your data, but in general you'll probably want to make use of the groupby
method, where you can group your table into chunks and apply the same statistical method to each group.
import pandas as pd
# read in your data from whatever form (e.g. csv file)
df = pd.read_csv('data.csv')
# group the data by each lat-lon pair:
df_groups = df.groupby(['lat', 'lon'])
# apply a method of your choice:
df_groups.sum()
df_groups.count()
df_groups.mean()
df_groups.std()
# or apply a user-defined function
df_groups.agg(lambda x: x*5 / 100)
Upvotes: 0