Reputation: 1266
I have two dataframes (df1
, df2
):
x id
35 4
55 3
92 2
99 5
and
id x val
1 (0.0, 50.0] 1.2
2 (90.0, inf] 0.5
3 (0.0, 50.0] 8.9
3 (50.0, 90.0] 9.9
4 (0.0, 50.0] 4.3
4 (50.0, 90.0] 1.1
4 (90.0, inf] 2.9
5 (50.0, 90.0] 3.2
5 (90.0, inf] 5.1
Want to add a new column x_new
in the first dataframe, df1
, which values depends on the lookup-table from the second dataframe, df2
. According to the id
and the value of x
, there is a special multiplier, to get the new value x_new
:
x id x_new
35 4 35*4.3
55 3 55*9.9
92 2 ...
99 5 ...
The value ranges in the second dataframe were created with a pandas cut:
df2 = df.groupby(['id', pd.cut(df.x, [0,50,90,np.inf])]).apply(lambda x: np.average(x['var1']/x['var2'], weights=x['var1'])).reset_index(name='val')
My idea is starting with the pandas built in lookup function:
df1['x_new'] = df.lookup(df.index, df['id'])
Don't know how to get it work.
Also see my previous question for more information about the code.
Upvotes: 2
Views: 1732
Reputation: 62493
pd.Interval
40 in pd.Interval(0.0, 50.0, closed='right')
evaluates as True
pd.Interval
is in a index, a value passed using .loc
, will find the correct interval.
df2.loc[(3, 35)]
will return 8.9
df2
is multi-indexed, the values for the index, are passed as a tuple
.KeyError
will occur if a value from df1
doesn't exist in the index of df2
, so you may need to write a function with try-except
.
df1_in_df2 = df1[df1.id.isin(df2.index.get_level_values(0))]
will find all df1.id
in df2.index
import pandas as pd
import numpy as np
# setupt dataframes
df1 = pd.DataFrame({'id': [4, 3, 2, 5], 'x': [35, 55, 92, 99]})
df2 = pd.DataFrame({'id': [1, 2, 3, 3, 4, 4, 4, 5, 5], 'x': [pd.Interval(0.0, 50.0, closed='right'), pd.Interval(90.0, np.inf, closed='right'), pd.Interval(0.0, 50.0, closed='right'), pd.Interval(50.0, 90.0, closed='right'), pd.Interval(0.0, 50.0, closed='right'), pd.Interval(50.0, 90.0, closed='right'), pd.Interval(90.0, np.inf, closed='right'), pd.Interval(50.0, 90.0, closed='right'), pd.Interval(90.0, np.inf, closed='right')], 'val': [1.2, 0.5, 8.9, 9.9, 4.3, 1.1, 2.9, 3.2, 5.1]})
# set id and x as the index of df2
df2 = df2.set_index(['id', 'x'])
# display(df2)
val
id x
1 (0.0, 50.0] 1.2
2 (90.0, inf] 0.5
3 (0.0, 50.0] 8.9
(50.0, 90.0] 9.9
4 (0.0, 50.0] 4.3
(50.0, 90.0] 1.1
(90.0, inf] 2.9
5 (50.0, 90.0] 3.2
(90.0, inf] 5.1
# use a lambda expression to pass id and x of df1 as index labels to df2 and return val
df1['val'] = df1.apply(lambda x: df2.loc[(x['id'], x['x'])], axis=1)
# multiple x and val to get x_new
df1['x_new'] = df1.x.mul(df1.val)
# display(df1)
id x val x_new
0 4 35 4.3 150.5
1 3 55 9.9 544.5
2 2 92 0.5 46.0
3 5 99 5.1 504.9
Upvotes: 3