For loop for pd.merge

Question

I am trying to merge total sales value according to the city on columns & invoice number on the row using for loop, the end result only happen for the last city.

import pandas as pd
Super=pd.read_excel(r"C:\supermarket_sales.xlsx")

def Summary(city):
    Location=Super[Super["City"]==city]
    PDT=Location.groupby(["Product line","Invoice ID"])["Total"].sum()
    PDTT=PDT.reset_index()
    PDTT.rename(columns={"Total": city+"Total"},inplace=True)
    return PDTT

for city in Super['City'].unique():
    PDT = Summary(city)

Invoice=Super["Invoice ID"]

for city in Super['City'].unique():
    df=Summary(city)
    L3V=pd.merge(Invoice,df[["Invoice ID",city+"Total"]],on ="Invoice ID",how="left")

Output by code:

Expected Output

Ank · Accepted Answer

The end-result only includes last city because in your for loop, in each iteration you are recreating new L3V dataframe from pd.merge, and information for previous city is lost. So at the end of the loop, you are left with last city's info only.

You need to hold each city's merged data, and merge others into it. Do you last loop as below:

Invoice = Super["Invoice ID"]
L3V = Invoice.copy()

for city in Super['City'].unique():
    df = Summary(city)
    L3V = pd.merge(L3V, df[["Invoice ID",city+"Total"]], on ="Invoice ID", how="left")

For loop for pd.merge

Answers (1)

Related Questions