Eli Turasky
Eli Turasky

Reputation: 1061

Plot frequency of dataframe value per year

I have a dataframe that contains hourly temperature data from 1990-2019 for 25 different locations. I want to count the amount of hours that a value is above or below a certain threshold and then plot that amount as a sum of the hours for every year. I know I can use a bar chart or histogram to plot, but am unsure how to aggregate the data to perform this task.

Dataframe:

time                 Antwerp       Rotterdam  ...
1990-01-01 00:00:00  2             4          ...
1990-01-01 01:00:00  3             4          ...
1990-01-01 02:00:00  2             4          ...
...              

Do I need to use the groupby function?

Sample data to demonstrate:

time                    Antwerp Rotterdam   Los Angeles
0   1990-01-01 00:00:00 0       2           15
1   1990-01-01 01:00:00 1       4           14
2   1990-01-01 02:00:00 3       5           15
3   1990-01-01 03:00:00 2       6           16

Now I am looking for the amount of hours that one city is equal to or less than 5 degrees during the year 1990. Expected output:

time    Antwerp Rotterdam   Los Angeles
1990    4       3           0

Ideally I would want to be able to select whatever temperature value I want.

Upvotes: 1

Views: 198

Answers (2)

jezrael
jezrael

Reputation: 862601

I think you need DatetimeIndex, compare, e.g. for greater by DataFrame.gt and then count Trues values by aggregate sum:

df['time'] = pd.to_datetime(df['time'])
df = df.set_index('time')

N = 2
df = df.gt(N).groupby(df.index.year).sum()
print (df)
      Antwerp  Rotterdam
time                    
1990      0.0        1.0
1991      1.0        2.0

If want low or equal use DataFrame.le:

N = 3
df = df.le(N).groupby(df.index.year).sum()
print (df)
      Antwerp  Rotterdam
time                    
1990      1.0        0.0
1991      2.0        0.0

Upvotes: 1

Zabir Al Nazi Nabil
Zabir Al Nazi Nabil

Reputation: 11198

This is without using pandas functions.

# get the time column as a list by timelist = list(df['time'])
def get_hour_ud(df, threshold):
    # timelist = list(df['time'])
    # df['time'] = ['1990-01-01 00:00:00', '1990-01-01 01:00:00', '1990-01-01 02:00:00'] # remove this line
    timelist = list(df['time'])
    hour_list = [int(a.split(' ')[1].split(':')[0]) for a in timelist]
    up_cnt = sum(a>threshold for a in hour_list)
    low_cnt = sum(a<threshold for a in hour_list)
    print(up_cnt)
    print(low_cnt)
    return up_cnt, low_cnt

Upvotes: 0

Related Questions