Reputation: 616
I have a dataframe:
4 Amazon 2 x 0.0 2.0 4.0 6.0 8.0
5 Amazon 2 y 0.0 1.0 2.0 3.0 4.0
df2:
Amazon 2 60
Netflix 1 100
Netflix 2 110
I am trying to compare the slope values in the axis column to the corresponding optimal cost values and extract the slope, x and y values that are closest to the optimal cost.
Expected output:
0 Amazon 1 120 2 0.8
1 Amazon 2 57 4 2
Upvotes: 0
Views: 111
Reputation: 120509
# Extract data and rearrange index
# Now slope and optim have the same index
slope = df1.loc[df1["Axis"] == "slope"].set_index(["Name", "Segment"]).drop(columns="Axis")
optim = df2.set_index(["Name", "Segment"]).reindex(slope.index)
# Find the closest column to the optimal cost
idx = slope.sub(optim.values).abs().idxmin(axis="columns")
>>> idx
Name Segment
Amazon 1 3 # column '3' 120 <- optimal: 115
2 3 # column '3' 57 <- optimal: 60
dtype: object
>>> df1.set_index(["Name", "Segment", "Axis"]) \
.groupby(["Name", "Segment"], as_index=False) \
.apply(lambda x: x[idx[x.name]]).unstack() \
.rename_axis(columns=None).reset_index(["Name", "Segment"])
Name Segment slope x y
0 Amazon 1 120.0 2.0 0.8
1 Amazon 2 57.0 4.0 2.0
Upvotes: 0
Reputation: 13417
You can use pd.merge_asof
to perform this type of merge quickly. However there is some preprocessing you'll need to do to your data.
df1
to match the format of the expected output (e.g. where "slope", "x", and "y" are columns instead of rowsNaNs
from the merge keys AND sort both df1
and df2
by their merge keys (this is a requirement of pd.merge_asof
that we need to do explicitly). Merge keys are going to be the "slope" and "optimal cost" columns.# Reshape df1
df1_reshaped = df1.set_index(["Name", "Segment", "Axis"]).unstack(-1).stack(0)
# Drop NaN, sort_values by the merge keys, ensure merge keys are same dtype
df1_reshaped = df1_reshaped.dropna(subset=["slope"]).sort_values("slope")
df2 = df2.sort_values("Optimal Cost").astype({"Optimal Cost": float})
# Perform the merge
out = (
pd.merge_asof(
df2,
df1_reshaped,
left_on="Optimal Cost",
right_on="slope",
by=["Name", "Segment"],
direction="nearest"
).dropna()
)
print(out)
Name Segment Optimal Cost slope x y
0 Amazon 2 60.0 57.0 4.0 2.0
3 Amazon 1 115.0 120.0 2.0 0.8
And that's it!
If you're curious, here are what df1_reshaped
and df2
look like prior to the merge (after the preprocessing).
>>> print(df1_reshaped)
Axis slope x y
Name Segment
Amazon 2 2 50.0 2.0 1.0
3 57.0 4.0 2.0
4 72.0 6.0 3.0
5 81.0 8.0 4.0
1 2 100.0 1.0 0.4
3 120.0 2.0 0.8
4 127.0 3.0 1.2
5 140.0 4.0 1.6
>>> print(df2)
Name Segment Optimal Cost
1 Amazon 2 60.0
2 Netflix 1 100.0
3 Netflix 2 110.0
0 Amazon 1 115.0
Upvotes: 3