How can I assign values to a DataFrame based on column-to-value mapping?

Question

I have a DataFrame that looks like this

df = pd.DataFrame({
    "A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
    "B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
    "C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
    "D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
    "E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
    "F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
    "X0": [100, 200, 300, 400, 500, 600],
    "X1": [101, 201, 301, 401, 501, 601],
    "X2": [102, 202, 302, 402, 502, 602],
    "X3": [103, 203, 303, 403, 503, 603],
    "X4": [104, 204, 304, 404, 504, 604],
    "X5": [105, 205, 305, 405, 505, 605],
    "X6": [106, 206, 306, 406, 506, 606],
    "X7": [107, 207, 307, 407, 507, 607]
})

df

    A   B   C   D   E   F   X0  X1  X2  X3  X4  X5  X6  X7
0   Y0  Y2  Y7  Y2  Y6  Y0  100 101 102 103 104 105 106 107
1   Y2  Y5  Y1  Y5  Y3  Y5  200 201 202 203 204 205 206 207
2   Y5  Y0  Y4  Y7  Y1  Y2  300 301 302 303 304 305 306 307
3   Y1  Y2  Y2  Y7  Y6  Y0  400 401 402 403 404 405 406 407
4   Y3  Y7  Y5  Y1  Y1  Y0  500 501 502 503 504 505 506 507
5   Y5  Y6  Y0  Y5  Y0  Y0  600 601 602 603 604 605 606 607

and a mapping

mapping = {
    'Y0': 'X0',
    'Y1': 'X1',
    'Y2': 'X2',
    'Y3': 'X3',
    'Y4': 'X4',
    'Y5': 'X5',
    'Y6': 'X6',
    'Y7': 'X7',
}

In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).

I am trying to assign new columns, A_result through F_result, whose values come from columns X0 through X7 using the values in A through F as a key.

For example, the second element of C_result should be 201 because the second element of C is Y1 and the second element of X1 has a value of 201.

The code I have to do this now is

for col in ['A', 'B', 'C', 'D', 'E', 'F']:
    col_result = col + '_result'
    df[col_result] = 0

    for k, v in mapping.items():
        df.loc[df[col] == k, col_result] = df[v]

df.filter(regex='_result', axis=1)

    A_result    B_result    C_result    D_result    E_result    F_result
0   100         102         107         102         106         100
1   202         205         201         205         203         205
2   305         300         304         307         301         302
3   401         402         402         407         406         400
4   503         507         505         501         501         500
5   605         606         600         605         600         600

This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?

r.ook · Accepted Answer

Optimized a bit now, you can try this:

df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')

#    A_results  B_results  C_results  D_results  E_results  F_results
# 0        100        102        107        102        106        100
# 1        202        205        201        205        203        205
# 2        305        300        304        307        301        302
# 3        401        402        402        407        406        400
# 4        503        507        505        501        501        500
# 5        605        606        600        605        600        600

Test speed on n=1000:

this method = 6.141038116
your method = 96.079351477

Worth noting that on single runs the performance doesn't seem to vary that much though.

How can I assign values to a DataFrame based on column-to-value mapping?

Answers (1)

Related Questions