Reputation: 65
I am trying to get the percentage of each day where sum is given.
I have data in daily with datetime index and i resemble index to yearly using method sum and here is the code.
data_converted = data.resample('AS').sum()
and what I want is to get the percentage of each day:
Here is what I wanted to do
percentage=[ (data[all_day_in_year2000] / data_converted[year2000] )*100 ]
Noted that my data has more than one year
If you think i am asking too much than you can neglect this one:
It would be nice if it's flexible incase I resemble daily data to monthly data or quarter data.
Thank you for your consideration.
Sorry for not posting my data
Here is my data.
1986-01-02 25.56
1986-01-03 26.00
1986-01-06 26.53
1986-01-07 25.85
1986-01-08 25.87
1986-01-09 26.03
1986-01-10 25.65
1986-01-13 25.08
... ...
2019-06-04 53.50
2019-06-05 51.57
2019-06-06 52.59
2019-06-07 53.95
2019-06-10 53.33
2019-06-11 53.30
When i resample using sum i got the following.
1986-01-01 3776.97
1987-01-01 4876.93
1988-01-01 4103.11
1989-01-01 5046.32
1990-01-01 6303.33
1991-01-01 5514.59
and what i want is the percent of the first one respective to second.It can also be store in list with just the percentage alone without datetime:
#value in daily /value in yearly*100
1986-01-02 (25.56 / 3776.97)*100
1986-01-03 (26.00 / 3776.97)*100
1986-01-06 (26.53 / 3776.97)*100
1986-01-07 (25.85 / 3776.97)*100
similarly for next year:
1986-02-03 (17.42 / 4876.93)*100
Upvotes: 1
Views: 287
Reputation: 17164
Just replace month by year in this answer, you are good to go:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': pd.date_range('1986-01-01','1986-03-01',closed='left')})
df['b'] = df.a.dt.month
# this is the answer, just one liner
df['answer']=df.b/df.groupby(df.a.dt.month).b.transform('sum')*100
# just looking the sum, this is not required
df['just_check_sum'] = df.groupby(df.a.dt.month).b.transform('sum')
print(df.head())
a b answer just_check_sum
0 1986-01-01 1 3.225806 31 ===> 1/31*100 = 3.22 %
1 1986-01-02 1 3.225806 31
2 1986-01-03 1 3.225806 31
3 1986-01-04 1 3.225806 31
4 1986-01-05 1 3.225806 31
Upvotes: 0
Reputation: 4849
I prefer to use groupby and then apply per group:
by_year = data.groupby(pd.Grouper(freq='AS'))
percentage = by_year.apply(lambda x: x/x.sum() * 100)
Upvotes: 2
Reputation: 25259
you may use transform
with resample
and doing calculation between daily values with result from transform
of sum
Your Sample data: (I just grab partial data which you posted)
Out[11]:
val
dates
1986-01-02 25.56
1986-01-03 26.00
1986-01-06 26.53
1986-01-07 25.85
1986-01-08 25.87
1986-01-09 26.03
1986-01-10 25.65
1986-01-13 25.08
2019-06-04 53.50
2019-06-05 51.57
2019-06-06 52.59
2019-06-07 53.95
2019-06-10 53.33
2019-06-11 53.30
df.val / df.resample('AS').val.transform('sum') * 100
Out[14]:
dates
1986-01-02 12.373530
1986-01-03 12.586532
1986-01-06 12.843104
1986-01-07 12.513918
1986-01-08 12.523600
1986-01-09 12.601055
1986-01-10 12.417098
1986-01-13 12.141163
2019-06-04 16.811212
2019-06-05 16.204751
2019-06-06 16.525264
2019-06-07 16.952614
2019-06-10 16.757793
2019-06-11 16.748366
Name: val, dtype: float64
Upvotes: 3