SoupriceCoder
SoupriceCoder

Reputation: 286

Pandas groupby, how to edit the result

I'm trying to write a code that should give me a summary of the data provided. However, there are few problems I'm facing right now.

When I use the groupby function, I get the following output.

I want the output to give me a sorted result. For example, the Airport with the highest "SkyTeam" Customer appear at the top. I've written some parts of the code but couldn't find a way to connect them.

#(Airport name, Skyteam passenger, airport total passenger)
#Sorted with 2 rule, "Skyteam Highest to Lowest" and "Total pax Highest to lowest"
('JFK', '12903454','37796770')
('IAD', '1050960', '14616186')

df_3 = pd.merge(df_initial, df_info[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
    
df_5 = df_3.groupby(['Origin Airport','Operating Airline Alliance'])["Operating Airline   Capacity"].sum()

df_6 = df_3.groupby(['Origin Airport'])["Operating Airline   Capacity"].sum()

for Airports in df_6.items() :
    print(Airports)

The output:

('IAD', 14616186)
('JFK', 37796770)

Origin Airport  Operating Airline Alliance
IAD             LCC                             292410
                Oneworld                        955450
                Qatar                           129210
                SkyTeam                        1050960
                Star Alliance                  7458776
                US-Domestic                     598389
JFK             LCC                            8735749
                Oneworld                       6530773
                Qatar                           240528
                SkyTeam                       12903454
                Star Alliance                  2722954
                US-Domestic                     217540
Name: Operating Airline   Capacity, dtype: int64

Thanks to everyone who've replied to my question. I've went on with M.Erkin's approach and edited the code he provided a bit.

df_3 = pd.merge(df_initial, df_info[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_3 = df_3.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"],as_index=False).agg({"Operating Airline   Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline   Capacity', ascending=False)


df_4 = pd.merge(df_initial, df_infoTK[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_4 = df_4.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"]).agg({"Operating Airline   Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline   Capacity', ascending=False)


df_5 = pd.merge(df_initial, df_infoQT[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_5 = df_5.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"]).agg({"Operating Airline   Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline   Capacity', ascending=False)

In the final part i've consolidated the results with a merge function. I did it in 2 steps because i don't really know how to do it in 1 step but if you do, please feel free to advise!

df_6 = pd.merge(df_3, df_4[["Operating Airline   Capacity", "DepCount"]], on='Origin Airport', how='left')
#df_6.fillna(0,inplace=True)

df_7 = pd.merge(df_6 ,df_5[["Operating Airline   Capacity", "DepCount"]], on='Origin Airport', how='left')

Leaving this here. It might help someone seeking similar projects!

Upvotes: 0

Views: 173

Answers (2)

M.Erkin
M.Erkin

Reputation: 120

Here is a solution;

df_3 = df_3.groupby(["Origin Airport", "Operating Airline Alliance" ]).agg({"Operating Airline   Capacity":np.sum}).sort_values(by='Operating Airline   Capacity', ascending=False)

Basically it groups the dataframe with respect to airpot and alliance.Then with aggregate function, it sums up the capacities for each group and these sums become 'Operating Airline Capacity' values for each group, at this point it becomes a new dataframe.And then finally it sorts the rows with respect to Operating Airline Capacity column, in descending order.

Upvotes: 1

Gijs Wobben
Gijs Wobben

Reputation: 2060

You can still sort a multi-index, you just have to specify how.. If you have multiple criteria, just chain them (.sort_values("<first column>").sort_values("<second column>").

import pandas

# Recreating your dataset with the multi-index
data = [
    {"Origin Airport": "IAD", "Operating Airline Alliance": "LCC", "Capacity": 292410},
    {"Origin Airport": "IAD", "Operating Airline Alliance": "Oneworld", "Capacity": 955450},
    {"Origin Airport": "IAD", "Operating Airline Alliance": "Qatar", "Capacity": 129210},
    {"Origin Airport": "IAD", "Operating Airline Alliance": "SkyTeam", "Capacity": 1050960},
    {"Origin Airport": "IAD", "Operating Airline Alliance": "Star Alliance", "Capacity": 7458776},
    {"Origin Airport": "IAD", "Operating Airline Alliance": "US-Domestic", "Capacity": 598389},
    {"Origin Airport": "JFK", "Operating Airline Alliance": "LCC", "Capacity": 8735749},
    {"Origin Airport": "JFK", "Operating Airline Alliance": "Oneworld", "Capacity": 6530773},
    {"Origin Airport": "JFK", "Operating Airline Alliance": "Qatar", "Capacity": 240528},
    {"Origin Airport": "JFK", "Operating Airline Alliance": "SkyTeam", "Capacity": 12903454},
    {"Origin Airport": "JFK", "Operating Airline Alliance": "Star Alliance", "Capacity": 2722954},
    {"Origin Airport": "JFK", "Operating Airline Alliance": "US-Domestic", "Capacity": 217540},
]
index = pandas.MultiIndex.from_tuples(
    [(line["Origin Airport"], line["Operating Airline Alliance"]) for line in data],
    names=["Origin Airport", "Operating Airline Alliance"],
)
df = pandas.DataFrame([{"Capacity": line["Capacity"]} for line in data], index=index)

# Sort the dataframe by the capacity column
df.sort_values("Capacity")

# Sort by the largest origin airport
df.groupby("Origin Airport").sum().sort_values("Capacity")

Upvotes: 0

Related Questions