Reputation: 117
I am trying to write a function that maps between 2 data-frames and returns a value based on conditions. One is a mapping DF, given below.
data1 = {'variable':['A','A','A','B','B','C','C'],
'lower' : [0,100,200,0,5,1,4],
'upper' : [100,200,300,5,10,4,8],
'value' : [32,15,45,18,46,33,45]}
df1 = pd.DataFrame(data1, columns = ['variable','lower', 'upper','value'])
print(df1)
The other is the DF where I would apply the mapping, given below.
data2 = {'A':[23,244,131,54,258,117,43,224],
'B' : [0,2,1,9,7,4,6,3],
'C' : [2,1,7,4,8,3,8,1]}
df2 = pd.DataFrame(data2, columns = ['A','B', 'C'])
print(df2)
What I am trying to do is create new column mapped_A
, which will check the mapping DF
for all rows with A
and return the value based on the upper and lower ranges. For example the first row for mapped_A
column will be 32
since 0 < 23 <= 100
. Same with columns B
and C
.
Desired Output:
I can hard-code the logic something like the following, but there must be better ways to handle this.
conditions = [(df2['A'] > 0) & (df2['A'] <= 100),
(df2['A'] > 100) & (df2['A'] <= 200),
(df2['A'] > 200) & (df2['A'] <= 300)]
choices = [32, 15, 45]
df2['mapped_A'] = np.select(conditions, choices, default='null')
Upvotes: 1
Views: 2744
Reputation: 189
Well I think ThePyGuy's solution is much more graceful but you could make a custom function that would save you the copy and pasting like so:
def maps2(d1, d2):
out = {}
for i in d2:
condNums = [n for n in range(len(d1["variable"])) if d1["variable"][n] == i]
n = "mapped_" + i
out[n] = []
for u in range(len(d2[i])):
for o in condNums:
if d2[i][u] >= d1["lower"][o] and d2[i][u] <= d1["upper"][o]:
out[n].append(d1["value"][o])
return out
Also I changed your logic conditions in the if
because you're desired output has conflicts with your original logic (the >
is now >=
).
Upvotes: 3
Reputation: 18416
Even though use of eval
is generally not recommended, it is handy to use in such scenarios where you have to build multiple conditions dynamically. You can build the string required to create those masking, then pass it to eval
builtin while using np.select
, iterating each columns in the second dataframe, and assigning the values to a new column.
for col in df2:
colValues = df1[df1['variable'].eq(col)]
condition = '['+\
','.join(colValues.apply(lambda x:f"(df2[col]>{x['lower']}) & (df2[col]<={x['upper']})",
axis=1).values)+\
']'
values = colValues['value']
df2[f'maped_{col}'] = np.select(eval(condition), values, default=np.nan)
OUTPUT:
A B C maped_A maped_B maped_C
0 23 0 2 32.0 NaN 33.0
1 244 2 1 45.0 18.0 NaN
2 131 1 7 15.0 18.0 45.0
3 54 9 4 32.0 46.0 33.0
4 258 7 8 45.0 46.0 45.0
5 117 4 3 15.0 18.0 33.0
6 43 6 8 32.0 46.0 45.0
7 224 3 1 45.0 18.0 NaN
Upvotes: 2