How to group entries by date and calculate a percent

Question

I have the following raw data:

df = 

MONTH   DAY   ID    GROUP
1       1     222   1
1       1     111   2
1       2     333   2
...
12      1     XXS   1

For each date, I need to calcuate the percent of entries that have Group equal to 1. The result should be a new dataframe in the following format (The date should be dd/mm/yy):

date,percent
01/01/16,50
02/01/16,0
...

I tried this, but it does not give the expected result, because I don't know how to proceed from this point. Not sure how to pass date to a single column in the format dd/mm/yy and how to calculate the percent of entries having GROUP == 1:

new_df = df.groupby(['MONTH', 'DAY']).agg(['count'])

UPDATE:

print(df.types) gives the following output:

MONTH                    float64
DAY                      float64
GROUP                    float64
ID                       object
date                     datetime64[ns]

This is the sample real data:

MONTH   DAY GROUP   ID  date
1.0     4.0 2.0 00085163    2016-01-04
1.0     4.0 1.0 000F9334    2016-01-04
1.0     4.0 2.0 002744A2    2016-01-04
1.0     4.0 2.0 00337BB1    2016-01-04
1.0     4.0 2.0 00374DE5    2016-01-04

MaxU - stand with Ukraine · Accepted Answer

UPDATE: for the GROUP column of "float64" dtype

In [67]: df
Out[67]:
   MONTH  DAY  GROUP        ID
0    1.0  4.0    2.0  00085163
1    1.0  4.0    1.0  000F9334
2    1.0  4.0    2.0  002744A2
3    1.0  4.0    2.0  00337BB1
4    1.0  4.0    2.0  00374DE5

In [68]: (df.assign(date=pd.to_datetime(df.assign(YEAR=pd.datetime.now().year)
    ...:                                  .loc[:, ['YEAR','MONTH','DAY']])
    ...:                   .dt.strftime('%d/%m/%y'))
    ...:    .groupby('date', as_index=0)['GROUP']
    ...:    .agg({'percent':lambda x: len(np.where(np.isclose(x,1))[0])/x.count()*100})
    ...: )
    ...:
Out[68]:
       date  percent
0  04/01/16     20.0

Old answer for integer GROUP column:

In [40]: df.groupby(['MONTH', 'DAY'], as_index=0)['GROUP'].agg({'percent':lambda x: len(x[x==1])/x.count()*100})
Out[40]:
   MONTH  DAY  percent
0      1    1       50
1      1    2        0
2     12    1      100

if you need date as a single column:

In [50]: df['date'] = pd.to_datetime(df.assign(YEAR=pd.datetime.now().year).loc[:, ['YEAR','MONTH','DAY']]).dt.strftime('%d/%m/%y')

In [51]: df
Out[51]:
   MONTH  DAY   ID  GROUP      date
0      1    1  222      1  01/01/16
1      1    1  111      2  01/01/16
2      1    2  333      2  02/01/16
3     12    1  444      1  01/12/16

In [52]: df.groupby('date', as_index=0)['GROUP'].agg({'percent':lambda x: len(x[x==1])/x.count()*100})
Out[52]:
       date  percent
0  01/01/16       50
1  01/12/16      100
2  02/01/16        0

How to group entries by date and calculate a percent

Answers (2)

Related Questions