Reputation: 459
i have the following issue:
2018-03-07 18:00:00 NW
2018-03-07 19:00:00 NW
2018-03-07 20:00:00 NW
2018-03-07 21:00:00 W
2018-03-07 22:00:00 W
2018-03-07 23:00:00 W
2018-03-08 00:00:00 W
2018-03-08 01:00:00 NW
2018-03-08 02:00:00 NW
2018-03-08 03:00:00 NW
2018-03-08 04:00:00 NW
2018-03-08 05:00:00 SW
2018-03-08 06:00:00 SE
2018-03-08 07:00:00 NE
2018-03-08 08:00:00 E
2018-03-08 09:00:00 W
2018-03-08 10:00:00 W
2018-03-08 11:00:00 NW
A Dataframe (dfWdir) which includes a hourly timestamp for a total of 360 days and the average wind direction in this hour as a label. Eg: N, NE, E, SE, S, SW, W, NW. I now want to know how much precent of the day there is wind blowing in north direction or any ohter direction. So at the end every precentige of a day would add up to 100%. EG: We have day 2018-03-07 and 20% of the wind is coming from NW direction 10% of S direction and so on.
By the following Code i get to the solution:
df2 = df.resample('D')['Wind'].value_counts()
df2.name = 'count_Wind'
df2 = df2.reset_index('Wind')
df3 = df2.resample('D')['count_Wind'].sum()
df2['percent'] = df2['count_Wind'].div(df3) * 100
But the result shall be shown in a plot.area() graphic. So I need a column for each possible Winddirection for each timestep. Within this columns are the precent for each timestep.
i am thankful for any help!
Upvotes: 0
Views: 1275
Reputation: 3184
Create two groupby objects. One counts the wind by the day, and the second counts total wind count by day. Then you can us the div method at the day level to get your answer.
import datetime
df = pd.DataFrame.from_dict({
'Date' : [
'2018-03-07 18:00:00',
'2018-03-07 19:00:00',
'2018-03-07 20:00:00',
'2018-03-07 21:00:00',
'2018-03-07 22:00:00',
'2018-03-07 23:00:00',
'2018-03-08 00:00:00',
'2018-03-08 01:00:00',
'2018-03-08 02:00:00'
],
'Wind' : ['NW','NW','NW','W','W','W','W','NW','NW']} )
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S',)
Date Wind
0 2018-03-07 18:00:00 NW
1 2018-03-07 19:00:00 NW
2 2018-03-07 20:00:00 NW
3 2018-03-07 21:00:00 W
4 2018-03-07 22:00:00 W
5 2018-03-07 23:00:00 W
6 2018-03-08 00:00:00 W
7 2018-03-08 01:00:00 NW
8 2018-03-08 02:00:00 NW
# split datetime into day and time
df['Day'] = [d.date() for d in df['Date']]
df['Time'] = [d.time() for d in df['Date']]
# groupby Day and Wind,, then group on day, then divide
day_wind = df.groupby(['Day', 'Wind']).agg({'Time': 'count'})
day_total = df.groupby(['Day']).agg({'Time': 'count'})
day_wind.div(day_total, level='Day').round(3) * 100
Output:
Time
Day Wind
2018-03-07 NW 50.0
W 50.0
2018-03-08 NW 66.7
W 33.3
Upvotes: 0
Reputation: 1286
I'm not sure how your result should exactly look like and assume that the colums of the DataFrame are Date
and Direction
.
You get the absolute values for the wind direction for every day by:
df.groupby([df.Date.dt.floor('D'), 'Direction'])['Direction'].agg(len).unstack().fillna(0)
The number of hours per day in your dataset:
df.groupby([df.Date.dt.floor('D')])['Date'].agg(len)
Combining this leads to the relative values.
Upvotes: 0
Reputation: 2811
With Date
as index, resample by day and count Wind
unique value
import pandas as pd
df = pd.DataFrame({
'Date' : [
'2018-03-07 18:00:00',
'2018-03-07 19:00:00',
'2018-03-07 20:00:00',
'2018-03-07 21:00:00',
'2018-03-07 22:00:00',
'2018-03-07 23:00:00',
'2018-03-08 00:00:00',
'2018-03-08 01:00:00',
'2018-03-08 02:00:00'
],
'Wind' : ['NW','NW','NW','W','W','W','W','NW','NW']})
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
df2 = df.resample('D')['Wind'].value_counts()
df2.name = 'count_Wind'
df2 = df2.reset_index('Wind')
df3 = df2.resample('D')['count_Wind'].sum()
df2['percent'] = df2['count_Wind'].div(df3) * 100
the result:
df2.head()
Wind count_Wind percent
Date
2018-03-07 NW 3 50.000000
2018-03-07 W 3 50.000000
2018-03-08 NW 2 66.666667
2018-03-08 W 1 33.333333
If you don't need count_Wind
, use del df2['count_Wind']
Upvotes: 2