Reputation: 139
I have a dataframe with columns 'gender', and 'year'. I need to get the ratio of female to males based on the year. Below is a sample dataframe.
data = {'gender' : ['m', 'm', 'm', 'f', 'm', 'f'],
'Year' : ['2000', '2000', '2003', '2000', '2001', '2001']}
my_df = pd.DataFrame (data, columns = ['gender','Year'])
my_df = my_df.sort_values('Year') #trial
print(my_df )
My output should be:
data = { 'Year' : ['2000', '2001', '2003'],
'ratio' : [0.33,0.5,0]}
my_df = pd.DataFrame (data, columns = ['Year', 'ratio'])
print(my_df)
This is what I tried: I first sort the dataframe based on year so that it is easier to get the total count. But I am not sure how to get the number of males in that specific year.
Upvotes: 0
Views: 133
Reputation: 862441
Use crosstab
first and for ration of counts divide columns:
df = pd.crosstab(my_df['Year'], my_df['gender'])
df['ratio'] = df['f'].div(df['m'])
print(df )
gender f m ratio
Year
2000 1 2 0.5
2001 1 1 1.0
2003 0 1 0.0
If need ratio to all values add normalize=0
:
df = pd.crosstab(my_df['Year'], my_df['gender'], normalize=0).add_prefix('ratio_')
print(df )
gender ratio_f ratio_m
Year
2000 0.333333 0.666667
2001 0.500000 0.500000
2003 0.000000 1.000000
If need ratio only female
:
df = my_df['gender'].eq('f').groupby(my_df['Year']).mean().reset_index(name='ratio')
print(df)
Year ratio
0 2000 0.333333
1 2001 0.500000
2 2003 0.000000
Upvotes: 1