Reputation: 21
I am trying to calculate the following 'new_field' column by triple looping through the 'name', 'val_id' and 'fac_id' column with the following conditions.
1.Within each 'val_id' loop if 'product' == 'CL' then min of 'val_against' and 'our_val_amt' e.g. min( val_against (134), our_val_amt (424)) therefore 'NEW FIELD' = 134. Also if the sum of new_field exceeds "our_val_amt", then subtract it from "our_val_amt". e.g. for val_id "xx4", (200 + 300 + 50) = 550 which exceeds our_val_amt = 510, so NEW FILED = 510 - 500 (i.e. 200 + 300 after this sum exceeds our_val_amt) = 10.
2.If product != 'CL' and is in the same 'val_id' group. The remainder to be subtracted from 'our_val_amt' to be inserted in 'new_field'. e.g 'our_val_amt' (424) - from step 1 (134) = 290. This inserted above 'NEW FIELD'.
If [product] doesn't have 'CL' it just needs to spread the [our_val_amt] between each [val_id]. For example val_id = 'xx7' our_val_amt =700 this is spread in the first row (650) inserted and then left over 700 - 650 = 50 is inserted in next row with the following being 0 as per the example.
3.Repeat steps for val_id xx2. NEW FIELD calculation for CL = 104 and XL = 472 - 104 = 368.
Currently the output works correctly for 'name' - compx(row 0 - 9) and begins to not correctly calculate onwards. I'm also unsure how this code works as i'm new to Pandas and appreciate if someone can explain the defined function how the program is thinking.
df = pd.DataFrame(data=[["compx","xx1","yy1",424,418,"XL"],["compx","xx1","yy2",424,134,"CL"],["compx","xx2","yy3",472,60,"DL"],["compx","xx2","yy4",472,104,"CL"], ["compx", "xx3", "yy5", 490, 50, "XL"], ["compx", "xx3", "yy6", 490, 500, "CL"], ["compx", "xx3", "yy7", 490, 200, "DL"], ["compx", "xx4", "yy8", 510, 200, "CL"], ["compx", "xx4", "yy9", 510, 300, "CL"], ["compx", "xx4", "yy10", 510, 50, "CL"], ["compy", "xx5", "yy11", 510, 200, "CL"], ["compy", "xx5", "yy12", 510, 300, "CL"], ["compy", "xx5", "yy12", 510, 50, "CL"], ["compy", "xx5", "yy13", 510, 30, "DL"], ["compz", "xx6", "yy14", 350, 200, "CL"], ["compz", "xx6", "yy15", 350, 100, "CL"], ["compz", "xx6", "yy16", 350, 50, "XL"], ["compz", "xx6", "yy17", 350, 50, "DL"], ["compz", "xx7", "yy18", 700, 650, "DL"], ["compz", "xx7", "yy19", 700, 200, "DL"], ["compz", "xx7", "yy20", 700, 400, "XL"] ], columns=["name","val_id","fac_id","our_val_amt","val_against","product"])
df
# Compute tuple of "our_val_amt", "val_against" and "product" for easy processing as one column. It is hard to process multiple columns with "transform()".
df["the_tuple"] = df[["our_val_amt", "val_against", "product"]].apply(tuple, axis=1)
def compute_new_field_for_cl(g):
# df_g is a tuple ("our_val_amt", "val_against", "product") indexed as (0, 1, 2).
df_g = g.apply(pd.Series)
df_g["new_field"] = df_g.apply(lambda row: min(row[0], row[1]) if row[2] == "CL" else 0, axis=1)
df_g["cumsum"] = df_g["new_field"].cumsum()
df_g["new_field"] = df_g.apply(lambda row: 0 if row["cumsum"] > row[0] else row["new_field"], axis=1)
df_g["max_cumsum"] = df_g["new_field"].cumsum()
df_g["new_field"] = df_g.apply(lambda row: row[0] - row["max_cumsum"] if row["cumsum"] > row[0] else row["new_field"], axis=1)
return df_g["new_field"]
# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_cl)
# Re-compute tuple of "our_val_amt", "new_field" and "product".
df["the_tuple"] = df[["our_val_amt", "new_field", "product"]].apply(tuple, axis=1)
def compute_new_field_for_not_cl(g):
# df_g is a tuple ("our_val_amt", "new_field", "product") indexed as (0, 1, 2).
df_g = g.apply(pd.Series)
result_sr = df_g.where(df_g[2] != "CL")[0] - df_g[df_g[2] == "CL"][1].sum()
result_sr = result_sr.fillna(0) + df_g[1]
return result_sr
# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_not_cl)
df = df.drop("the_tuple", axis=1)
df
Dataset and new_field output trying to achieve.
name |val_id |fac_id | our_val_amt | val_against | product | new_field
compx | xx1 | yy1 | 424 | 418 | XL | 290
compx | xx1 | yy2 | 424 | 134 | CL | 134
compx | xx2 | yy3 | 472 | 60 | DL | 368
compx | xx2 | yy4 | 472 | 104 | CL | 104
compx | xx3 | yy5 | 490 | 50 | XL | 0
compx | xx3 | yy6 | 490 | 500 | CL | 490
compx | xx3 | yy7 | 490 | 200 | DL | 0
compx | xx4 | yy8 | 510 | 200 | CL | 200
compx | xx4 | yy9 | 510 | 300 | CL | 300
compx | xx4 | yy10 | 510 | 50 | CL | 10
compy | xx5 | yy11 | 510 | 200 | CL | 200
compy | xx5 | yy12 | 510 | 300 | CL | 300
compy | xx5 | yy12 | 510 | 50 | CL | 10
compy | xx5 | yy13 | 510 | 30 | DL | 0
compz | xx6 | yy14 | 350 | 200 | CL | 200
compz | xx6 | yy15 | 350 | 100 | CL | 100
compz | xx6 | yy16 | 350 | 50 | XL | 50
compz | xx6 | yy17 | 350 | 50 | DL | 0
compz | xx7 | yy18 | 700 | 650 | DL | 650
compz | xx7 | yy19 | 700 | 200 | DL | 50
compz | xx7 | yy20 | 700 | 400 | XL | 0
Dataset and new_field output that i'm currently getting
name |val_id |fac_id | our_val_amt | val_against | product | new_field
compx | xx1 | yy1 | 424 | 418 | XL | 290
compx | xx1 | yy2 | 424 | 134 | CL | 134
compx | xx2 | yy3 | 472 | 60 | DL | 368
compx | xx2 | yy4 | 472 | 104 | CL | 104
compx | xx3 | yy5 | 490 | 50 | XL | 0
compx | xx3 | yy6 | 490 | 500 | CL | 490
compx | xx3 | yy7 | 490 | 200 | DL | 0
compx | xx4 | yy8 | 510 | 200 | CL | 200
compx | xx4 | yy9 | 510 | 300 | CL | 300
compx | xx4 | yy10 | 510 | 50 | CL | 10
compy | xx5 | yy11 | 510 | 200 | CL | 200
compy | xx5 | yy12 | 510 | 300 | CL | 300
compy | xx5 | yy12 | 510 | 50 | CL | 10
compy | xx5 | yy13 | 510 | 30 | DL | 10
compz | xx6 | yy14 | 350 | 200 | CL | 200
compz | xx6 | yy15 | 350 | 100 | CL | 100
compz | xx6 | yy16 | 350 | 50 | XL | 50
compz | xx6 | yy17 | 350 | 50 | DL | 50
compz | xx7 | yy18 | 700 | 650 | DL | 700
compz | xx7 | yy19 | 700 | 200 | DL | 700
compz | xx7 | yy20 | 700 | 400 | XL | 700
Upvotes: 0
Views: 70
Reputation: 4098
The code failed at a corner case where there are multiple non-CL products and the our_val_amt
had to be spread such that last few products may get 0
value. I asked about this use case in last/preceding question; but it went unanswered. You may have some corner cases like this and need to perform a comprehensive test.
Following is the updated logic. The comments are added before each processing line to explain what it does.
df = pd.DataFrame(data=[["compx","xx1","yy2",424,134,"CL",134],["compx","xx1","yy1",424,418,"XL",290],["compx","xx2","yy4",472,104,"CL",104],["compx","xx2","yy3",472,60,"DL",368],["compx","xx3","yy6",490,500,"CL",490],["compx","xx3","yy5",490,50,"XL",0],["compx","xx3","yy7",490,200,"DL",0],["compx","xx4","yy8",510,200,"CL",200],["compx","xx4","yy9",510,300,"CL",300],["compx","xx4","yy10",510,50,"CL",10],["compy","xx5","yy11",510,200,"CL",200],["compy","xx5","yy12",510,300,"CL",300],["compy","xx5","yy12",510,50,"CL",10],["compy","xx5","yy13",510,30,"DL",0],["compz","xx6","yy14",350,200,"CL",200],["compz","xx6","yy15",350,100,"CL",100],["compz","xx6","yy16",350,50,"XL",50],["compz","xx6","yy17",350,50,"DL",0],["compz","xx7","yy18",700,650,"DL",650],["compz","xx7","yy19",700,200,"DL",50],["compz","xx7","yy20",700,400,"XL",0]], columns=["name","val_id","fac_id","our_val_amt","val_against","product","new_field_expected"])
# Compute tuple of "our_val_amt", "val_against" and "product" for easy processing as one column. It is hard to process multiple columns with "transform()".
df["the_tuple"] = df[["our_val_amt", "val_against", "product"]].apply(tuple, axis=1)
def compute_new_field_for_cl(g):
# df_g is a tuple ("our_val_amt", "val_against", "product") indexed as (0, 1, 2).
df_g = g.apply(pd.Series)
df_g["new_field"] = df_g.apply(lambda row: min(row[0], row[1]) if row[2] == "CL" else 0, axis=1)
# Cumulative sum for comparison
df_g["cumsum"] = df_g["new_field"].cumsum()
# Previous row's sum for comparison
df_g["cumsum_prev"] = df_g["cumsum"].shift(periods=1)
# if our_val_amt >= sum then use min(our_val_amt, val_against)
# else if our_val_amt < sum then take partial of first record such that our_val_amt == sum else take `0` for the rest records
df_g["new_field"] = df_g.apply(lambda row: 0 if row["cumsum_prev"] > row[0] else row[0] - row["cumsum_prev"] if row["cumsum"] > row[0] else row["new_field"], axis=1)
return df_g["new_field"]
# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_cl)
# Re-compute tuple of "our_val_amt", "val_against", "new_field" and "product".
df["the_tuple"] = df[["our_val_amt", "val_against", "new_field", "product"]].apply(tuple, axis=1)
def compute_new_field_for_not_cl(g):
# df_g is a tuple ("our_val_amt", "val_against", "new_field", "product") indexed as (0, 1, 2, 3).
df_g = g.apply(pd.Series)
# print(df_g)
cl_sum = df_g[df_g[3] == "CL"][2].sum()
if cl_sum > 0:
df_g["new_field"] = df_g.where(df_g[3] != "CL")[0] - df_g[df_g[3] == "CL"][2].sum()
df_g["new_field"] = df_g["new_field"].fillna(df_g[2])
# Cumulative sum for comparison
df_g["cumsum"] = df_g["new_field"].cumsum()
# if our_val_amt < sum then take diff (our_val_amt - sum) else take `0` for the rest records
df_g["new_field"] = df_g.apply(lambda row: 0 if row["cumsum"] > row[0] else row["new_field"], axis=1)
else:
df_g["new_field"] = df_g.apply(lambda row: min(row[0], row[1]) if row[3] != "CL" else row[2], axis=1)
# Cumulative sum for comparison
df_g["cumsum"] = df_g["new_field"].cumsum()
# Previous row's sum for comparison
df_g["cumsum_prev"] = df_g["cumsum"].shift(periods=1)
# if our_val_amt >= sum then use min(our_val_amt, val_against)
# else if our_val_amt < sum then take partial of first record such that our_val_amt == sum else take `0` for the rest records
df_g["new_field"] = df_g.apply(lambda row: 0 if row["cumsum_prev"] > row[0] else row[0] - row["cumsum_prev"] if row["cumsum"] > row[0] else row["new_field"], axis=1)
return df_g["new_field"]
# Apply above function and compute new field values for "CL".
df["new_field"] = df.groupby("val_id")[["the_tuple"]].transform(compute_new_field_for_not_cl)
df = df.drop("the_tuple", axis=1)
print(df)
Output:
name val_id fac_id our_val_amt val_against product new_field_expected new_field
0 compx xx1 yy2 424 134 CL 134 134.00
1 compx xx1 yy1 424 418 XL 290 290.00
2 compx xx2 yy4 472 104 CL 104 104.00
3 compx xx2 yy3 472 60 DL 368 368.00
4 compx xx3 yy6 490 500 CL 490 490.00
5 compx xx3 yy5 490 50 XL 0 0.00
6 compx xx3 yy7 490 200 DL 0 0.00
7 compx xx4 yy8 510 200 CL 200 200.00
8 compx xx4 yy9 510 300 CL 300 300.00
9 compx xx4 yy10 510 50 CL 10 10.00
10 compy xx5 yy11 510 200 CL 200 200.00
11 compy xx5 yy12 510 300 CL 300 300.00
12 compy xx5 yy12 510 50 CL 10 10.00
13 compy xx5 yy13 510 30 DL 0 0.00
14 compz xx6 yy14 350 200 CL 200 200.00
15 compz xx6 yy15 350 100 CL 100 100.00
16 compz xx6 yy16 350 50 XL 50 50.00
17 compz xx6 yy17 350 50 DL 0 0.00
18 compz xx7 yy18 700 650 DL 650 650.00
19 compz xx7 yy19 700 200 DL 50 50.00
20 compz xx7 yy20 700 400 XL 0 0.00
Upvotes: 1