Ipsider
Ipsider

Reputation: 571

How to count ratios of two column values by group?

I have a list of flights as a pandas dataframe like this:

airline        status          number    ...

Aer Lingus     some error A        14    ...
Aeroflot       success             47    ...
Air Canada     success              2    ...
Air Dolomiti   success              2    ...
Air Europa     some error B         4    ...
Air France     success             40    ...
Alitalia       some error A        10    ...
....

I make some api calls for each flight which either succeeds or gives an error.

What I would like to have is the ratio of success for each airline. So I know that I can count the occurrences of the values in column "airline" and "status" with groupby, but I fail to do this for the ratio of the values.

def calculate_quote_success_ratios(flights):

    success_ratio_per_airline = flights.groupby(['airline', 'status']).count()
    # TODO: Include ratio with failures!

    return success_ratio_per_airline

the expected output should be something like this:

airline        success_ratio

Aer Lingus     0.72
Aeroflot       0.845
Air Canada     0.935
Air Dolomiti   0.5
Air Europa     ...
....

Edit: To be clear, the column df['numbers'] is not relevant. So I only want to count the percentage of "successes" to all sorts of errors for each airline.

Upvotes: 1

Views: 681

Answers (5)

Dhruv
Dhruv

Reputation: 425

A single line code that doesn't refer to the name of frame again. For just in case this thing is applied in continuation to some other command and frame might not have any name at that moment. Or someone likes to write single line commands

# Using @Chris's dataset
      airline          status      number
0    Aer Lingus  some error A      14
1      Aeroflot       success      47
2    Air Canada       success       2
3  Air Dolomiti       success       2
4    Air Europa  some error B       4
5    Air France       success      40
6      Alitalia  some error A      10
7      Alitalia       success      10
8    Air France  some error B      10
9    Aer Lingus       success      12


df.groupby(by = ['airline', 'status'])['status'].count().unstack(level = -1, fill_value = 0)/
.apply(lambda row: row['success']/row.values.sum(), axis = 1)


print(df)

airline
Aer Lingus      0.5
Aeroflot        1.0
Air Canada      1.0
Air Dolomiti    1.0
Air Europa      0.0
Air France      0.5
Alitalia        0.5
dtype: float64

You might put some comments in between the continuous commands to understand better.

Upvotes: 1

Abhishek Joshi
Abhishek Joshi

Reputation: 21

For you this will work.

I am making a small change to the correct marked answer.

res = pd.crosstab(df['airline'], df['status'], values="", aggfunc='count', normalize='index')

print(res[['success']])

Upvotes: 1

Rohit Raj
Rohit Raj

Reputation: 339

group wise count of number of flights

count_flight=df.groupby('airline').status.count()

group wise count of number of successes

count_success=df[df['status']=='success'].groupby('airline').status.count()

pandas div return groupwise ratio of two previous dataframes.

count_success.div(count_flight).fillna(0)

Upvotes: 3

jpp
jpp

Reputation: 164713

You can use pd.crosstab with normalize='index':

# data from @Chris

res = pd.crosstab(df['airline'], df['status'], values=df['number'],
                  aggfunc='count', normalize='index')

print(res[['success']])

status       success
airline             
AerLingus        0.5
Aeroflot         1.0
AirCanada        1.0
AirDolomiti      1.0
AirEuropa        0.0
AirFrance        0.5
Alitalia         0.5

Upvotes: 1

It_is_Chris
It_is_Chris

Reputation: 14103

I had to modify your example for it to work but you can simply just do df.groupby(['airline', 'status']).sum() / df.groupby(['airline']).sum()

original df:

airline           status    number
0   Aer Lingus  some error A    14
1   Aeroflot    success 47
2   Air Canada  success 2
3   Air Dolomiti    success 2
4   Air Europa  some error B    4
5   Air France  success 40
6   Alitalia    some error A    10
7   Alitalia    success 10
8   Air France  some error B    10
9   Aer Lingus  success 12



df.groupby(['airline', 'status']).sum() / df.groupby(['airline']).sum()

                                  number
airline         status  
Aer Lingus     some error A     0.538462
               success          0.461538
Aeroflot       success          1.000000
Air Canada      success         1.000000
Air Dolomiti    success         1.000000
Air Europa    some error B      1.000000
Air France    some error B      0.200000
              success           0.800000
Alitalia      some error A      0.500000
              success           0.500000

Upvotes: 2

Related Questions