Reputation: 1061
I have a dataframe that contains hourly temperature data from 1990-2019 for 25 different locations. I want to count the amount of hours that a value is above or below a certain threshold and then plot that amount as a sum of the hours for every year. I know I can use a bar chart or histogram to plot, but am unsure how to aggregate the data to perform this task.
Dataframe:
time Antwerp Rotterdam ...
1990-01-01 00:00:00 2 4 ...
1990-01-01 01:00:00 3 4 ...
1990-01-01 02:00:00 2 4 ...
...
Do I need to use the groupby function?
Sample data to demonstrate:
time Antwerp Rotterdam Los Angeles
0 1990-01-01 00:00:00 0 2 15
1 1990-01-01 01:00:00 1 4 14
2 1990-01-01 02:00:00 3 5 15
3 1990-01-01 03:00:00 2 6 16
Now I am looking for the amount of hours that one city is equal to or less than 5 degrees during the year 1990. Expected output:
time Antwerp Rotterdam Los Angeles
1990 4 3 0
Ideally I would want to be able to select whatever temperature value I want.
Upvotes: 1
Views: 198
Reputation: 862601
I think you need DatetimeIndex
, compare, e.g. for greater by DataFrame.gt
and then count True
s values by aggregate sum
:
df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')
N = 2
df = df.gt(N).groupby(df.index.year).sum()
print (df)
Antwerp Rotterdam
time
1990 0.0 1.0
1991 1.0 2.0
If want low or equal use DataFrame.le
:
N = 3
df = df.le(N).groupby(df.index.year).sum()
print (df)
Antwerp Rotterdam
time
1990 1.0 0.0
1991 2.0 0.0
Upvotes: 1
Reputation: 11198
This is without using pandas functions.
# get the time column as a list by timelist = list(df['time'])
def get_hour_ud(df, threshold):
# timelist = list(df['time'])
# df['time'] = ['1990-01-01 00:00:00', '1990-01-01 01:00:00', '1990-01-01 02:00:00'] # remove this line
timelist = list(df['time'])
hour_list = [int(a.split(' ')[1].split(':')[0]) for a in timelist]
up_cnt = sum(a>threshold for a in hour_list)
low_cnt = sum(a<threshold for a in hour_list)
print(up_cnt)
print(low_cnt)
return up_cnt, low_cnt
Upvotes: 0