Hedge
Hedge

Reputation: 51

For loop for pd.merge

I am trying to merge total sales value according to the city on columns & invoice number on the row using for loop, the end result only happen for the last city.

import pandas as pd
Super=pd.read_excel(r"C:\supermarket_sales.xlsx")

def Summary(city):
    Location=Super[Super["City"]==city]
    PDT=Location.groupby(["Product line","Invoice ID"])["Total"].sum()
    PDTT=PDT.reset_index()
    PDTT.rename(columns={"Total": city+"Total"},inplace=True)
    return PDTT

for city in Super['City'].unique():
    PDT = Summary(city)

Invoice=Super["Invoice ID"]

for city in Super['City'].unique():
    df=Summary(city)
    L3V=pd.merge(Invoice,df[["Invoice ID",city+"Total"]],on ="Invoice ID",how="left")

Output by code: enter image description here

Expected Output

enter image description here

Upvotes: 1

Views: 114

Answers (1)

Ank
Ank

Reputation: 1704

The end-result only includes last city because in your for loop, in each iteration you are recreating new L3V dataframe from pd.merge, and information for previous city is lost. So at the end of the loop, you are left with last city's info only.

You need to hold each city's merged data, and merge others into it. Do you last loop as below:

Invoice = Super["Invoice ID"]
L3V = Invoice.copy()

for city in Super['City'].unique():
    df = Summary(city)
    L3V = pd.merge(L3V, df[["Invoice ID",city+"Total"]], on ="Invoice ID", how="left")

Upvotes: 1

Related Questions