python pandas add new column with values grouped count

Question

I want to add a new column with the number of times the points were over 700 and after the year 2014.

import pandas as pd

ipl_data = {'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
            'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

grouped = df.groupby('Year')
df.loc[(df['Points'] > 700) & (df['Year'] > 2014), 'High_points'] = df['Points']
#df['Point_per_year_gr_700']=df.groupby(by='Year')['Points'].transform('count')
df['Point_per_year_gr_700']=grouped['Points'].agg(np.size))

the end dataframe should look like this, but I cant get the 'Point_per_year_gr_700' right

    Year  Points         Point_per_year_gr_700  High_points
0   2014     876                                NaN
1   2015     789                     3        789.0
2   2014     863                                NaN
3   2015     673                                NaN
4   2014     741                                NaN
5   2015     812                     3        812.0
6   2016     756                     1        756.0
7   2017     788                     1        788.0
8   2016     694                                NaN
9   2014     701                                NaN
10  2015     804                     3        804.0
11  2017     690                                NaN

ALollz · Accepted Answer

Use where to mask the DataFrame to NaN where your condition isn't met. You can use this to create the High_points column and also to exclude rows that shouldn't count when you groupby year and find how many rows satisfy High_points each year.

df['High_points'] = df['Points'].where(df['Year'].gt(2014) & df['Points'].gt(700))
df['ppy_gt700'] = (df.where(df['High_points'].notnull())
                     .groupby('Year')['Year'].transform('size'))

    Year  Points  High_Points  ppy_gt700
0   2014     876          NaN        NaN
1   2015     789        789.0        3.0
2   2014     863          NaN        NaN
3   2015     673          NaN        NaN
4   2014     741          NaN        NaN
5   2015     812        812.0        3.0
6   2016     756        756.0        1.0
7   2017     788        788.0        1.0
8   2016     694          NaN        NaN
9   2014     701          NaN        NaN
10  2015     804        804.0        3.0
11  2017     690          NaN        NaN

python pandas add new column with values grouped count

Answers (1)

Related Questions