Reputation: 8628
I have the following raw data:
df =
MONTH DAY ID GROUP
1 1 222 1
1 1 111 2
1 2 333 2
...
12 1 XXS 1
For each date, I need to calcuate the percent of entries that have Group
equal to 1
. The result should be a new dataframe in the following format (The date should be dd/mm/yy
):
date,percent
01/01/16,50
02/01/16,0
...
I tried this, but it does not give the expected result, because I don't know how to proceed from this point. Not sure how to pass date
to a single column in the format dd/mm/yy
and how to calculate the percent of entries having GROUP == 1
:
new_df = df.groupby(['MONTH', 'DAY']).agg(['count'])
UPDATE:
print(df.types) gives the following output:
MONTH float64
DAY float64
GROUP float64
ID object
date datetime64[ns]
This is the sample real data:
MONTH DAY GROUP ID date
1.0 4.0 2.0 00085163 2016-01-04
1.0 4.0 1.0 000F9334 2016-01-04
1.0 4.0 2.0 002744A2 2016-01-04
1.0 4.0 2.0 00337BB1 2016-01-04
1.0 4.0 2.0 00374DE5 2016-01-04
Upvotes: 2
Views: 647
Reputation: 27879
This is a very robust solution but it seems to work:
temp = df.groupby(['MONTH', 'DAY']).agg({'GROUP': lambda x: float(len(x[x==1]))/x.count()*100}).astype(float).reset_index()
print temp
temp.rename(columns={'GROUP': 'PERCENT'}, inplace=True)
temp['DATE'] = '2016-' + temp['MONTH'].map(int).map(str) + '-' + temp['DAY'].map(int).map(str)
temp['DATE'] = temp['DATE'].apply(lambda x: pd.to_datetime(x))
final = temp[['DATE', 'PERCENT']].set_index('DATE')
Upvotes: 0
Reputation: 210872
UPDATE: for the GROUP
column of "float64" dtype
In [67]: df
Out[67]:
MONTH DAY GROUP ID
0 1.0 4.0 2.0 00085163
1 1.0 4.0 1.0 000F9334
2 1.0 4.0 2.0 002744A2
3 1.0 4.0 2.0 00337BB1
4 1.0 4.0 2.0 00374DE5
In [68]: (df.assign(date=pd.to_datetime(df.assign(YEAR=pd.datetime.now().year)
...: .loc[:, ['YEAR','MONTH','DAY']])
...: .dt.strftime('%d/%m/%y'))
...: .groupby('date', as_index=0)['GROUP']
...: .agg({'percent':lambda x: len(np.where(np.isclose(x,1))[0])/x.count()*100})
...: )
...:
Out[68]:
date percent
0 04/01/16 20.0
Old answer for integer GROUP column:
In [40]: df.groupby(['MONTH', 'DAY'], as_index=0)['GROUP'].agg({'percent':lambda x: len(x[x==1])/x.count()*100})
Out[40]:
MONTH DAY percent
0 1 1 50
1 1 2 0
2 12 1 100
if you need date as a single column:
In [50]: df['date'] = pd.to_datetime(df.assign(YEAR=pd.datetime.now().year).loc[:, ['YEAR','MONTH','DAY']]).dt.strftime('%d/%m/%y')
In [51]: df
Out[51]:
MONTH DAY ID GROUP date
0 1 1 222 1 01/01/16
1 1 1 111 2 01/01/16
2 1 2 333 2 02/01/16
3 12 1 444 1 01/12/16
In [52]: df.groupby('date', as_index=0)['GROUP'].agg({'percent':lambda x: len(x[x==1])/x.count()*100})
Out[52]:
date percent
0 01/01/16 50
1 01/12/16 100
2 02/01/16 0
Upvotes: 2