Reputation: 51
I am trying to merge total sales value according to the city on columns & invoice number on the row using for loop, the end result only happen for the last city.
import pandas as pd
Super=pd.read_excel(r"C:\supermarket_sales.xlsx")
def Summary(city):
Location=Super[Super["City"]==city]
PDT=Location.groupby(["Product line","Invoice ID"])["Total"].sum()
PDTT=PDT.reset_index()
PDTT.rename(columns={"Total": city+"Total"},inplace=True)
return PDTT
for city in Super['City'].unique():
PDT = Summary(city)
Invoice=Super["Invoice ID"]
for city in Super['City'].unique():
df=Summary(city)
L3V=pd.merge(Invoice,df[["Invoice ID",city+"Total"]],on ="Invoice ID",how="left")
Expected Output
Upvotes: 1
Views: 114
Reputation: 1704
The end-result only includes last city because in your for
loop, in each iteration you are recreating new L3V dataframe from pd.merge
, and information for previous city is lost. So at the end of the loop, you are left with last city's info only.
You need to hold each city's merged data, and merge others into it. Do you last loop as below:
Invoice = Super["Invoice ID"]
L3V = Invoice.copy()
for city in Super['City'].unique():
df = Summary(city)
L3V = pd.merge(L3V, df[["Invoice ID",city+"Total"]], on ="Invoice ID", how="left")
Upvotes: 1