Reputation: 286
I'm trying to write a code that should give me a summary of the data provided. However, there are few problems I'm facing right now.
When I use the groupby function, I get the following output.
I want the output to give me a sorted result. For example, the Airport with the highest "SkyTeam" Customer appear at the top. I've written some parts of the code but couldn't find a way to connect them.
#(Airport name, Skyteam passenger, airport total passenger)
#Sorted with 2 rule, "Skyteam Highest to Lowest" and "Total pax Highest to lowest"
('JFK', '12903454','37796770')
('IAD', '1050960', '14616186')
df_3 = pd.merge(df_initial, df_info[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_5 = df_3.groupby(['Origin Airport','Operating Airline Alliance'])["Operating Airline Capacity"].sum()
df_6 = df_3.groupby(['Origin Airport'])["Operating Airline Capacity"].sum()
for Airports in df_6.items() :
print(Airports)
The output:
('IAD', 14616186)
('JFK', 37796770)
Origin Airport Operating Airline Alliance
IAD LCC 292410
Oneworld 955450
Qatar 129210
SkyTeam 1050960
Star Alliance 7458776
US-Domestic 598389
JFK LCC 8735749
Oneworld 6530773
Qatar 240528
SkyTeam 12903454
Star Alliance 2722954
US-Domestic 217540
Name: Operating Airline Capacity, dtype: int64
Thanks to everyone who've replied to my question. I've went on with M.Erkin's approach and edited the code he provided a bit.
df_3 = pd.merge(df_initial, df_info[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_3 = df_3.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"],as_index=False).agg({"Operating Airline Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
df_4 = pd.merge(df_initial, df_infoTK[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_4 = df_4.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"]).agg({"Operating Airline Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
df_5 = pd.merge(df_initial, df_infoQT[['Operating Airline Name','Operating Airline Alliance']], on='Operating Airline Name', how='left')
df_5 = df_5.groupby(["Origin Airport","Origin Region Name","Origin Country Name","Origin City Name", "Operating Airline Alliance"]).agg({"Operating Airline Capacity":np.sum, "DepCount":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
In the final part i've consolidated the results with a merge function. I did it in 2 steps because i don't really know how to do it in 1 step but if you do, please feel free to advise!
df_6 = pd.merge(df_3, df_4[["Operating Airline Capacity", "DepCount"]], on='Origin Airport', how='left')
#df_6.fillna(0,inplace=True)
df_7 = pd.merge(df_6 ,df_5[["Operating Airline Capacity", "DepCount"]], on='Origin Airport', how='left')
Leaving this here. It might help someone seeking similar projects!
Upvotes: 0
Views: 173
Reputation: 120
Here is a solution;
df_3 = df_3.groupby(["Origin Airport", "Operating Airline Alliance" ]).agg({"Operating Airline Capacity":np.sum}).sort_values(by='Operating Airline Capacity', ascending=False)
Basically it groups the dataframe with respect to airpot and alliance.Then with aggregate function, it sums up the capacities for each group and these sums become 'Operating Airline Capacity' values for each group, at this point it becomes a new dataframe.And then finally it sorts the rows with respect to Operating Airline Capacity column, in descending order.
Upvotes: 1
Reputation: 2060
You can still sort a multi-index, you just have to specify how.. If you have multiple criteria, just chain them (.sort_values("<first column>").sort_values("<second column>")
.
import pandas
# Recreating your dataset with the multi-index
data = [
{"Origin Airport": "IAD", "Operating Airline Alliance": "LCC", "Capacity": 292410},
{"Origin Airport": "IAD", "Operating Airline Alliance": "Oneworld", "Capacity": 955450},
{"Origin Airport": "IAD", "Operating Airline Alliance": "Qatar", "Capacity": 129210},
{"Origin Airport": "IAD", "Operating Airline Alliance": "SkyTeam", "Capacity": 1050960},
{"Origin Airport": "IAD", "Operating Airline Alliance": "Star Alliance", "Capacity": 7458776},
{"Origin Airport": "IAD", "Operating Airline Alliance": "US-Domestic", "Capacity": 598389},
{"Origin Airport": "JFK", "Operating Airline Alliance": "LCC", "Capacity": 8735749},
{"Origin Airport": "JFK", "Operating Airline Alliance": "Oneworld", "Capacity": 6530773},
{"Origin Airport": "JFK", "Operating Airline Alliance": "Qatar", "Capacity": 240528},
{"Origin Airport": "JFK", "Operating Airline Alliance": "SkyTeam", "Capacity": 12903454},
{"Origin Airport": "JFK", "Operating Airline Alliance": "Star Alliance", "Capacity": 2722954},
{"Origin Airport": "JFK", "Operating Airline Alliance": "US-Domestic", "Capacity": 217540},
]
index = pandas.MultiIndex.from_tuples(
[(line["Origin Airport"], line["Operating Airline Alliance"]) for line in data],
names=["Origin Airport", "Operating Airline Alliance"],
)
df = pandas.DataFrame([{"Capacity": line["Capacity"]} for line in data], index=index)
# Sort the dataframe by the capacity column
df.sort_values("Capacity")
# Sort by the largest origin airport
df.groupby("Origin Airport").sum().sort_values("Capacity")
Upvotes: 0