Mapping pandas dataframe by condition

Question

I am trying to write a function that maps between 2 data-frames and returns a value based on conditions. One is a mapping DF, given below.

data1 = {'variable':['A','A','A','B','B','C','C'],
    'lower' : [0,100,200,0,5,1,4],
    'upper' : [100,200,300,5,10,4,8],
    'value' : [32,15,45,18,46,33,45]}

df1 = pd.DataFrame(data1, columns = ['variable','lower', 'upper','value'])

print(df1)

The other is the DF where I would apply the mapping, given below.

data2 = {'A':[23,244,131,54,258,117,43,224],
    'B' : [0,2,1,9,7,4,6,3],
    'C' : [2,1,7,4,8,3,8,1]}

df2 = pd.DataFrame(data2, columns = ['A','B', 'C'])

print(df2)

What I am trying to do is create new column mapped_A, which will check the mapping DF for all rows with A and return the value based on the upper and lower ranges. For example the first row for mapped_A column will be 32 since 0 < 23 <= 100. Same with columns B and C.

Desired Output:

I can hard-code the logic something like the following, but there must be better ways to handle this.

conditions = [(df2['A'] > 0) & (df2['A'] <= 100),
              (df2['A'] > 100) & (df2['A'] <= 200),
              (df2['A'] > 200) & (df2['A'] <= 300)]
choices = [32, 15, 45]

df2['mapped_A'] = np.select(conditions, choices, default='null')

ThePyGuy · Accepted Answer

Even though use of eval is generally not recommended, it is handy to use in such scenarios where you have to build multiple conditions dynamically. You can build the string required to create those masking, then pass it to eval builtin while using np.select, iterating each columns in the second dataframe, and assigning the values to a new column.

for col in df2:
    colValues = df1[df1['variable'].eq(col)]
    condition = '['+\
                ','.join(colValues.apply(lambda x:f"(df2[col]>{x['lower']}) & (df2[col]<={x['upper']})", 
                                         axis=1).values)+\
                ']'
    values = colValues['value']
    df2[f'maped_{col}'] = np.select(eval(condition), values, default=np.nan)

OUTPUT:

     A  B  C  maped_A  maped_B  maped_C
0   23  0  2     32.0      NaN     33.0
1  244  2  1     45.0     18.0      NaN
2  131  1  7     15.0     18.0     45.0
3   54  9  4     32.0     46.0     33.0
4  258  7  8     45.0     46.0     45.0
5  117  4  3     15.0     18.0     33.0
6   43  6  8     32.0     46.0     45.0
7  224  3  1     45.0     18.0      NaN

Mapping pandas dataframe by condition

Answers (2)

Related Questions