Reputation: 5862
I have a DataFrame that looks like this
df = pd.DataFrame({
"A": ['Y0', 'Y2', 'Y5', 'Y1', 'Y3', 'Y5'],
"B": ['Y2', 'Y5', 'Y0', 'Y2', 'Y7', 'Y6'],
"C": ['Y7', 'Y1', 'Y4', 'Y2', 'Y5', 'Y0'],
"D": ['Y2', 'Y5', 'Y7', 'Y7', 'Y1', 'Y5'],
"E": ['Y6', 'Y3', 'Y1', 'Y6', 'Y1', 'Y0'],
"F": ['Y0', 'Y5', 'Y2', 'Y0', 'Y0', 'Y0'],
"X0": [100, 200, 300, 400, 500, 600],
"X1": [101, 201, 301, 401, 501, 601],
"X2": [102, 202, 302, 402, 502, 602],
"X3": [103, 203, 303, 403, 503, 603],
"X4": [104, 204, 304, 404, 504, 604],
"X5": [105, 205, 305, 405, 505, 605],
"X6": [106, 206, 306, 406, 506, 606],
"X7": [107, 207, 307, 407, 507, 607]
})
df
A B C D E F X0 X1 X2 X3 X4 X5 X6 X7
0 Y0 Y2 Y7 Y2 Y6 Y0 100 101 102 103 104 105 106 107
1 Y2 Y5 Y1 Y5 Y3 Y5 200 201 202 203 204 205 206 207
2 Y5 Y0 Y4 Y7 Y1 Y2 300 301 302 303 304 305 306 307
3 Y1 Y2 Y2 Y7 Y6 Y0 400 401 402 403 404 405 406 407
4 Y3 Y7 Y5 Y1 Y1 Y0 500 501 502 503 504 505 506 507
5 Y5 Y6 Y0 Y5 Y0 Y0 600 601 602 603 604 605 606 607
and a mapping
mapping = {
'Y0': 'X0',
'Y1': 'X1',
'Y2': 'X2',
'Y3': 'X3',
'Y4': 'X4',
'Y5': 'X5',
'Y6': 'X6',
'Y7': 'X7',
}
In reality, the mapping between the Xs and the Ys is not trivial (but it can be put into code).
I am trying to assign new columns, A_result
through F_result
, whose values come from columns X0
through X7
using the values in A
through F
as a key.
For example, the second element of C_result
should be 201
because the second element of C
is Y1
and the second element of X1
has a value of 201
.
The code I have to do this now is
for col in ['A', 'B', 'C', 'D', 'E', 'F']:
col_result = col + '_result'
df[col_result] = 0
for k, v in mapping.items():
df.loc[df[col] == k, col_result] = df[v]
df.filter(regex='_result', axis=1)
A_result B_result C_result D_result E_result F_result
0 100 102 107 102 106 100
1 202 205 201 205 203 205
2 305 300 304 307 301 302
3 401 402 402 407 406 400
4 503 507 505 501 501 500
5 605 606 600 605 600 600
This works, but it's pretty slow. On a DataFrame with a few hundred rows, it takes a little less than a second to run. What's a faster way to do this?
Upvotes: 3
Views: 70
Reputation: 13888
Optimized a bit now, you can try this:
df2 = df.iloc[:,:6].apply(lambda x: [df[mapping.get(elem)][i] for i, elem in enumerate(x)]).rename(columns=lambda c: f'{c}_results')
# A_results B_results C_results D_results E_results F_results
# 0 100 102 107 102 106 100
# 1 202 205 201 205 203 205
# 2 305 300 304 307 301 302
# 3 401 402 402 407 406 400
# 4 503 507 505 501 501 500
# 5 605 606 600 605 600 600
Test speed on n=1000
:
this method = 6.141038116
your method = 96.079351477
Worth noting that on single runs the performance doesn't seem to vary that much though.
Upvotes: 2