SSV
SSV

Reputation: 139

How to get average of values of a column of a dataframe based on another column?

I have a dataframe with columns 'gender', and 'year'. I need to get the ratio of female to males based on the year. Below is a sample dataframe.

data = {'gender' : ['m', 'm', 'm', 'f', 'm', 'f'],
     'Year' : ['2000', '2000', '2003', '2000', '2001', '2001']}
my_df = pd.DataFrame (data, columns = ['gender','Year'])
my_df = my_df.sort_values('Year')   #trial
print(my_df )

My output should be:

data = { 'Year' : ['2000', '2001', '2003'],
     'ratio' : [0.33,0.5,0]}
my_df = pd.DataFrame (data, columns = ['Year', 'ratio'])
print(my_df)

This is what I tried: I first sort the dataframe based on year so that it is easier to get the total count. But I am not sure how to get the number of males in that specific year.

Upvotes: 0

Views: 133

Answers (1)

jezrael
jezrael

Reputation: 862441

Use crosstab first and for ration of counts divide columns:

df = pd.crosstab(my_df['Year'], my_df['gender'])
df['ratio'] = df['f'].div(df['m'])
print(df )
gender  f  m  ratio
Year               
2000    1  2    0.5
2001    1  1    1.0
2003    0  1    0.0

If need ratio to all values add normalize=0:

df = pd.crosstab(my_df['Year'], my_df['gender'], normalize=0).add_prefix('ratio_')
print(df )
gender   ratio_f   ratio_m
Year                      
2000    0.333333  0.666667
2001    0.500000  0.500000
2003    0.000000  1.000000

If need ratio only female:

df = my_df['gender'].eq('f').groupby(my_df['Year']).mean().reset_index(name='ratio')
print(df)
   Year     ratio
0  2000  0.333333
1  2001  0.500000
2  2003  0.000000    

Upvotes: 1

Related Questions