Plot frequency of dataframe value per year

Question

I have a dataframe that contains hourly temperature data from 1990-2019 for 25 different locations. I want to count the amount of hours that a value is above or below a certain threshold and then plot that amount as a sum of the hours for every year. I know I can use a bar chart or histogram to plot, but am unsure how to aggregate the data to perform this task.

Dataframe:

time                 Antwerp       Rotterdam  ...
1990-01-01 00:00:00  2             4          ...
1990-01-01 01:00:00  3             4          ...
1990-01-01 02:00:00  2             4          ...
...

Do I need to use the groupby function?

Sample data to demonstrate:

time                    Antwerp Rotterdam   Los Angeles
0   1990-01-01 00:00:00 0       2           15
1   1990-01-01 01:00:00 1       4           14
2   1990-01-01 02:00:00 3       5           15
3   1990-01-01 03:00:00 2       6           16

Now I am looking for the amount of hours that one city is equal to or less than 5 degrees during the year 1990. Expected output:

time    Antwerp Rotterdam   Los Angeles
1990    4       3           0

Ideally I would want to be able to select whatever temperature value I want.

jezrael · Accepted Answer

I think you need DatetimeIndex, compare, e.g. for greater by DataFrame.gt and then count Trues values by aggregate sum:

df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')

N = 2
df = df.gt(N).groupby(df.index.year).sum()
print (df)
      Antwerp  Rotterdam
time                    
1990      0.0        1.0
1991      1.0        2.0

If want low or equal use DataFrame.le:

N = 3
df = df.le(N).groupby(df.index.year).sum()
print (df)
      Antwerp  Rotterdam
time                    
1990      1.0        0.0
1991      2.0        0.0

Plot frequency of dataframe value per year

Answers (2)

Related Questions