RnK
RnK

Reputation: 117

Mapping pandas dataframe by condition

I am trying to write a function that maps between 2 data-frames and returns a value based on conditions. One is a mapping DF, given below.

data1 = {'variable':['A','A','A','B','B','C','C'],
    'lower' : [0,100,200,0,5,1,4],
    'upper' : [100,200,300,5,10,4,8],
    'value' : [32,15,45,18,46,33,45]}

df1 = pd.DataFrame(data1, columns = ['variable','lower', 'upper','value'])

print(df1)

mapping table

The other is the DF where I would apply the mapping, given below.

data2 = {'A':[23,244,131,54,258,117,43,224],
    'B' : [0,2,1,9,7,4,6,3],
    'C' : [2,1,7,4,8,3,8,1]}

df2 = pd.DataFrame(data2, columns = ['A','B', 'C'])

print(df2)

data table

What I am trying to do is create new column mapped_A, which will check the mapping DF for all rows with A and return the value based on the upper and lower ranges. For example the first row for mapped_A column will be 32 since 0 < 23 <= 100. Same with columns B and C.

Desired Output:

final table

I can hard-code the logic something like the following, but there must be better ways to handle this.

conditions = [(df2['A'] > 0) & (df2['A'] <= 100),
              (df2['A'] > 100) & (df2['A'] <= 200),
              (df2['A'] > 200) & (df2['A'] <= 300)]
choices = [32, 15, 45]

df2['mapped_A'] = np.select(conditions, choices, default='null')

Upvotes: 1

Views: 2744

Answers (2)

Aejjee
Aejjee

Reputation: 189

Well I think ThePyGuy's solution is much more graceful but you could make a custom function that would save you the copy and pasting like so:

def maps2(d1, d2):
    out = {}
    for i in d2:
        condNums = [n for n in range(len(d1["variable"])) if d1["variable"][n] == i]
        n = "mapped_" + i
        out[n] = []
        for u in range(len(d2[i])):
            for o in condNums:
                if d2[i][u] >= d1["lower"][o] and d2[i][u] <= d1["upper"][o]:
                    out[n].append(d1["value"][o])

    return out

Also I changed your logic conditions in the if because you're desired output has conflicts with your original logic (the > is now >=).

Upvotes: 3

ThePyGuy
ThePyGuy

Reputation: 18416

Even though use of eval is generally not recommended, it is handy to use in such scenarios where you have to build multiple conditions dynamically. You can build the string required to create those masking, then pass it to eval builtin while using np.select, iterating each columns in the second dataframe, and assigning the values to a new column.

for col in df2:
    colValues = df1[df1['variable'].eq(col)]
    condition = '['+\
                ','.join(colValues.apply(lambda x:f"(df2[col]>{x['lower']}) & (df2[col]<={x['upper']})", 
                                         axis=1).values)+\
                ']'
    values = colValues['value']
    df2[f'maped_{col}'] = np.select(eval(condition), values, default=np.nan)

OUTPUT:

     A  B  C  maped_A  maped_B  maped_C
0   23  0  2     32.0      NaN     33.0
1  244  2  1     45.0     18.0      NaN
2  131  1  7     15.0     18.0     45.0
3   54  9  4     32.0     46.0     33.0
4  258  7  8     45.0     46.0     45.0
5  117  4  3     15.0     18.0     33.0
6   43  6  8     32.0     46.0     45.0
7  224  3  1     45.0     18.0      NaN

Upvotes: 2

Related Questions