qwertz
qwertz

Reputation: 459

Pandas - Calculate daily percentage

i have the following issue:

2018-03-07 18:00:00    NW
2018-03-07 19:00:00    NW
2018-03-07 20:00:00    NW
2018-03-07 21:00:00     W
2018-03-07 22:00:00     W
2018-03-07 23:00:00     W
2018-03-08 00:00:00     W
2018-03-08 01:00:00    NW
2018-03-08 02:00:00    NW
2018-03-08 03:00:00    NW
2018-03-08 04:00:00    NW
2018-03-08 05:00:00    SW
2018-03-08 06:00:00    SE
2018-03-08 07:00:00    NE
2018-03-08 08:00:00     E
2018-03-08 09:00:00     W
2018-03-08 10:00:00     W
2018-03-08 11:00:00    NW

A Dataframe (dfWdir) which includes a hourly timestamp for a total of 360 days and the average wind direction in this hour as a label. Eg: N, NE, E, SE, S, SW, W, NW. I now want to know how much precent of the day there is wind blowing in north direction or any ohter direction. So at the end every precentige of a day would add up to 100%. EG: We have day 2018-03-07 and 20% of the wind is coming from NW direction 10% of S direction and so on.

By the following Code i get to the solution:

df2 = df.resample('D')['Wind'].value_counts()
df2.name = 'count_Wind'
df2 = df2.reset_index('Wind')

df3 = df2.resample('D')['count_Wind'].sum()
df2['percent'] = df2['count_Wind'].div(df3) * 100

final Dataframe

But the result shall be shown in a plot.area() graphic. So I need a column for each possible Winddirection for each timestep. Within this columns are the precent for each timestep.

i am thankful for any help!

Upvotes: 0

Views: 1275

Answers (3)

run-out
run-out

Reputation: 3184

Create two groupby objects. One counts the wind by the day, and the second counts total wind count by day. Then you can us the div method at the day level to get your answer.

import datetime
df = pd.DataFrame.from_dict({
    'Date' : [
        '2018-03-07 18:00:00',
        '2018-03-07 19:00:00',
        '2018-03-07 20:00:00',
        '2018-03-07 21:00:00',
        '2018-03-07 22:00:00',
        '2018-03-07 23:00:00',
        '2018-03-08 00:00:00',
        '2018-03-08 01:00:00',
        '2018-03-08 02:00:00'
    ],
    'Wind' : ['NW','NW','NW','W','W','W','W','NW','NW']} )

df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S',)

    Date              Wind
0   2018-03-07 18:00:00 NW
1   2018-03-07 19:00:00 NW
2   2018-03-07 20:00:00 NW
3   2018-03-07 21:00:00 W
4   2018-03-07 22:00:00 W
5   2018-03-07 23:00:00 W
6   2018-03-08 00:00:00 W
7   2018-03-08 01:00:00 NW
8   2018-03-08 02:00:00 NW 

# split datetime into day and time

df['Day'] = [d.date() for d in df['Date']]
df['Time'] = [d.time() for d in df['Date']]


# groupby Day and Wind,, then group on day, then divide 

day_wind = df.groupby(['Day', 'Wind']).agg({'Time': 'count'})
day_total = df.groupby(['Day']).agg({'Time': 'count'})
day_wind.div(day_total, level='Day').round(3) * 100

Output: 
               Time
Day        Wind 
2018-03-07  NW  50.0
             W  50.0
2018-03-08  NW  66.7
             W  33.3

Upvotes: 0

JoergVanAken
JoergVanAken

Reputation: 1286

I'm not sure how your result should exactly look like and assume that the colums of the DataFrame are Date and Direction.

You get the absolute values for the wind direction for every day by:

df.groupby([df.Date.dt.floor('D'), 'Direction'])['Direction'].agg(len).unstack().fillna(0)

The number of hours per day in your dataset:

df.groupby([df.Date.dt.floor('D')])['Date'].agg(len)

Combining this leads to the relative values.

Upvotes: 0

Terry
Terry

Reputation: 2811

With Date as index, resample by day and count Wind unique value

import pandas as pd

df = pd.DataFrame({
    'Date' : [
        '2018-03-07 18:00:00',
        '2018-03-07 19:00:00',
        '2018-03-07 20:00:00',
        '2018-03-07 21:00:00',
        '2018-03-07 22:00:00',
        '2018-03-07 23:00:00',
        '2018-03-08 00:00:00',
        '2018-03-08 01:00:00',
        '2018-03-08 02:00:00'
    ],
    'Wind' : ['NW','NW','NW','W','W','W','W','NW','NW']})

df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')

df2 = df.resample('D')['Wind'].value_counts()
df2.name = 'count_Wind'
df2 = df2.reset_index('Wind')

df3 = df2.resample('D')['count_Wind'].sum()
df2['percent'] = df2['count_Wind'].div(df3) * 100

the result:

df2.head()


            Wind    count_Wind  percent
Date            
2018-03-07  NW  3   50.000000
2018-03-07  W   3   50.000000
2018-03-08  NW  2   66.666667
2018-03-08  W   1   33.333333

If you don't need count_Wind, use del df2['count_Wind']

Upvotes: 2

Related Questions