Saqib Ali
Saqib Ali

Reputation: 4448

Calculate the proportions based on groupby in python

I have a dataframe as following:

student-gender  student-current-enrollment  student-native-country  car-owner
M   BS  Cuba    N
F   BS  Brazil  N
F   BS  US  N
M   MS  US  Y
F   MS  US  N
M   BS  Cuba    N
F   BS  Brazil  N
F   MS  US  N
F   MS  US  N
M   BS  Cuba    Y
F   BS  Brazil  N

I need to find out what proportion of students from each country own a car? How do I do that?

Upvotes: 0

Views: 280

Answers (2)

PieCot
PieCot

Reputation: 3639

You can use grupoby and agg with a lambda function:

df.groupby('student-native-country').agg(
    {'car-owner': lambda x: (x == 'Y').sum() * 100 / len(df)}
)

Here the result:

                        car-owner
student-native-country           
Brazil                   0.000000
Cuba                    33.333333
US                      20.000000

On the other hand, if you want to know the percentage considering only people owing a car:

df.groupby('student-native-country').agg(
    {'car-owner': lambda x: (x == 'Y').sum() * 100 / (df['car-owner'] == 'Y').sum()}
)

And in this case, the result is:

                        car-owner
student-native-country           
Brazil                        0.0
Cuba                         50.0
US                           50.0

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195528

print(
    df.replace({"N": 0, "Y": 1})
    .groupby("student-native-country")["car-owner"]
    .mean()
    * 100
)

Prints:

student-native-country
Brazil     0.000000
Cuba      33.333333
US        20.000000
Name: car-owner, dtype: float64

Upvotes: 1

Related Questions