Reputation: 509
Got Input df1 and df2
df1:
Subcategory_Desc Segment_Desc Flow Side Row_no
APPLE APPLE LOOSE Apple Kanzi Front Row 1
APPLE APPLE LOOSE Apple Jazz Front Row 1
CITRUS ORANGES LOOSE Orange Navel Front Row 1
PEAR PEARS LOOSE Lemon Right End Row 1
AVOCADOS AVOCADOS LOOSE Avocado Back Row 1
TROPICAL FRUIT KIWI FRUIT Kiwi Gold Back Row 1
TROPICAL FRUIT KIWI FRUIT Kiwi Green Left End Row 1
df2:
Subcategory_Desc Segment_Desc Flow
TROPICAL FRUIT KIWI FRUIT 5pk Kids Kiwi
APPLE APPLE LOOSE Apple GoldenDel
AVOCADOS AVOCADOS LOOSE Avocado Tray
Scenario: Dataframe df2 rows should be inserted to dataframe df1 considering below condition:
Expected Output:
Subcategory_Desc Segment_Desc Flow Side Row_no
APPLE APPLE LOOSE Apple Kanzi Front Row 1
APPLE APPLE LOOSE Apple Jazz Front Row 1
CITRUS ORANGES LOOSE Orange Navel Front Row 1
APPLE APPLE LOOSE Apple GoldenDel Front Row 1
PEAR PEARS LOOSE Lemon Right End Row 1
AVOCADOS AVOCADOS LOOSE Avocado Back Row 1
TROPICAL FRUIT KIWI FRUIT Kiwi Gold Back Row 1
TROPICAL FRUIT KIWI FRUIT 5pk Kids Kiwi Back Row 1
AVOCADOS AVOCADOS LOOSE Avocado Tray Back Row 1
TROPICAL FRUIT KIWI FRUIT Kiwi Green Left End Row 1
Not sure what simple logic can be used for this purpose.
Upvotes: 1
Views: 76
Reputation: 13488
So, given the following dataframes:
import pandas as pd
df1 = pd.DataFrame(
{
"Subcategory_Desc": {
0: "APPLE",
1: "APPLE",
2: "CITRUS",
3: "PEAR",
4: "AVOCADOS",
5: "TROPICAL FRUIT",
6: "TROPICAL FRUIT",
},
"Segment_Desc": {
0: "APPLE LOOSE",
1: "APPLE LOOSE",
2: "ORANGES LOOSE",
3: "PEARS LOOSE",
4: "AVOCADOS LOOSE",
5: "KIWI FRUIT",
6: "KIWI FRUIT",
},
"Flow": {
0: "Apple Kanzi",
1: "Apple Jazz",
2: "Orange Navel",
3: "Lemon",
4: "Avocado",
5: "Kiwi Gold",
6: "Kiwi Green",
},
"Side": {
0: "Front",
1: "Front",
2: "Front",
3: "Right_End",
4: "Back",
5: "Back",
6: "Left_End",
},
"Row_no": {
0: "Row 1",
1: "Row 1",
2: "Row 1",
3: "Row 1",
4: "Row 1",
5: "Row 1",
6: "Row 1",
},
}
)
df2 = pd.DataFrame(
{
"Subcategory_Desc": {0: "TROPICAL FRUIT", 1: "APPLE", 2: "AVOCADOS"},
"Segment_Desc": {0: "KIWI FRUIT", 1: "APPLE LOOSE", 2: "AVOCADOS LOOSE"},
"Flow": {0: "5pk Kids Kiwi", 1: "Apple GoldenDel", 2: "Avocado Tray"},
}
)
You could try this:
# Initialize new column
df2["idx"] = ""
# Find indice of first match in df1
for _, row2 in df2.iterrows():
for i, row1 in df1.iterrows():
if i + 1 >= df1.shape[0]:
break
if (
row1["Subcategory_Desc"] == row2["Subcategory_Desc"]
and row1["Segment_Desc"] == row2["Segment_Desc"]
):
row2["idx"] = i
df2 = df2.sort_values(by="idx").reset_index(drop=True)
# Starting from previous indice, find insertion indice in df1
for i, idx in enumerate(df2["idx"]):
side_of_idx = df1.loc[idx, "Side"]
df2.loc[i, "pos"] = df1.index[df1["Side"] == side_of_idx].to_list()[-1] + 1
positions = df2["pos"].astype("int").to_list()
# Clean up df2
df2 = df2.drop(columns=["idx", "pos"])
df2["Side"] = df2["Row_no"] = ""
# Iterate on df1 to insert new rows
for i, pos in enumerate(positions):
# Fill missing values
df2.loc[i, "Side"] = df1.loc[pos - 1, "Side"]
df2.loc[i, "Row_no"] = df1.loc[pos, "Row_no"]
# Insert row
df1 = pd.concat(
[df1.iloc[:pos], pd.DataFrame([df2.iloc[i]]), df1.iloc[pos:]], ignore_index=True
).reset_index(drop=True)
# Increment next position since df1 has changed
if i < len(positions) - 1:
positions[i + 1] += 1
And so:
print(df1)
# Outputs
Subcategory_Desc Segment_Desc Flow Side Row_no
0 APPLE APPLE LOOSE Apple Kanzi Front Row 1
1 APPLE APPLE LOOSE Apple Jazz Front Row 1
2 CITRUS ORANGES LOOSE Orange Navel Front Row 1
3 APPLE APPLE LOOSE Apple GoldenDel Front Row 1
4 PEAR PEARS LOOSE Lemon Right_End Row 1
5 AVOCADOS AVOCADOS LOOSE Avocado Back Row 1
6 TROPICAL FRUIT KIWI FRUIT Kiwi Gold Back Row 1
7 TROPICAL FRUIT KIWI FRUIT 5pk Kids Kiwi Back Row 1
8 AVOCADOS AVOCADOS LOOSE Avocado Tray Back Row 1
9 TROPICAL FRUIT KIWI FRUIT Kiwi Green Left_End Row 1
Upvotes: 1