Reputation: 571
I have a list of flights as a pandas dataframe like this:
airline status number ...
Aer Lingus some error A 14 ...
Aeroflot success 47 ...
Air Canada success 2 ...
Air Dolomiti success 2 ...
Air Europa some error B 4 ...
Air France success 40 ...
Alitalia some error A 10 ...
....
I make some api calls for each flight which either succeeds or gives an error.
What I would like to have is the ratio of success for each airline. So I know that I can count the occurrences of the values in column "airline" and "status" with groupby, but I fail to do this for the ratio of the values.
def calculate_quote_success_ratios(flights):
success_ratio_per_airline = flights.groupby(['airline', 'status']).count()
# TODO: Include ratio with failures!
return success_ratio_per_airline
the expected output should be something like this:
airline success_ratio
Aer Lingus 0.72
Aeroflot 0.845
Air Canada 0.935
Air Dolomiti 0.5
Air Europa ...
....
Edit: To be clear, the column df['numbers']
is not relevant. So I only want to count the percentage of "successes" to all sorts of errors for each airline.
Upvotes: 1
Views: 681
Reputation: 425
A single line code that doesn't refer to the name of frame again.
For just in case this thing is applied in continuation to some other command and frame might not have any name at that moment. Or someone likes to write single line commands
# Using @Chris's dataset
airline status number
0 Aer Lingus some error A 14
1 Aeroflot success 47
2 Air Canada success 2
3 Air Dolomiti success 2
4 Air Europa some error B 4
5 Air France success 40
6 Alitalia some error A 10
7 Alitalia success 10
8 Air France some error B 10
9 Aer Lingus success 12
df.groupby(by = ['airline', 'status'])['status'].count().unstack(level = -1, fill_value = 0)/
.apply(lambda row: row['success']/row.values.sum(), axis = 1)
print(df)
airline
Aer Lingus 0.5
Aeroflot 1.0
Air Canada 1.0
Air Dolomiti 1.0
Air Europa 0.0
Air France 0.5
Alitalia 0.5
dtype: float64
You might put some comments in between the continuous commands to understand better.
Upvotes: 1
Reputation: 21
For you this will work.
I am making a small change to the correct marked answer.
res = pd.crosstab(df['airline'], df['status'], values="", aggfunc='count', normalize='index')
print(res[['success']])
Upvotes: 1
Reputation: 339
group wise count of number of flights
count_flight=df.groupby('airline').status.count()
group wise count of number of successes
count_success=df[df['status']=='success'].groupby('airline').status.count()
pandas div return groupwise ratio of two previous dataframes.
count_success.div(count_flight).fillna(0)
Upvotes: 3
Reputation: 164713
You can use pd.crosstab
with normalize='index'
:
# data from @Chris
res = pd.crosstab(df['airline'], df['status'], values=df['number'],
aggfunc='count', normalize='index')
print(res[['success']])
status success
airline
AerLingus 0.5
Aeroflot 1.0
AirCanada 1.0
AirDolomiti 1.0
AirEuropa 0.0
AirFrance 0.5
Alitalia 0.5
Upvotes: 1
Reputation: 14103
I had to modify your example for it to work but you can simply just do df.groupby(['airline', 'status']).sum() / df.groupby(['airline']).sum()
original df:
airline status number
0 Aer Lingus some error A 14
1 Aeroflot success 47
2 Air Canada success 2
3 Air Dolomiti success 2
4 Air Europa some error B 4
5 Air France success 40
6 Alitalia some error A 10
7 Alitalia success 10
8 Air France some error B 10
9 Aer Lingus success 12
df.groupby(['airline', 'status']).sum() / df.groupby(['airline']).sum()
number
airline status
Aer Lingus some error A 0.538462
success 0.461538
Aeroflot success 1.000000
Air Canada success 1.000000
Air Dolomiti success 1.000000
Air Europa some error B 1.000000
Air France some error B 0.200000
success 0.800000
Alitalia some error A 0.500000
success 0.500000
Upvotes: 2