Jeff
Jeff

Reputation: 591

Way around auto renaming DataFrame columns

In Pandas when you define a dataframe and merge it to other dataframes on and on, the column names will be replaced with either a the exact name or _x or _y etc depending on how many you merged. This becomes a pain when you find that you had to implement something in your earlier code and you go back to change and merge a an extra dataframe. Now this then renames the exact name or _x or _y and now you have to replace the columns that come afterwards with the newly created names.

Is there a way around this? Do i need to use a table type of data-structure like hdf5 etc?

Upvotes: 2

Views: 1428

Answers (1)

jezrael
jezrael

Reputation: 863226

Maybe help set parameter suffixes in merge:

import pandas as pd

left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})
print left
    k  v
0  K0  1
1  K1  2
2  K2  3

right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})
print right
    k  v
0  K0  4
1  K0  5
2  K3  6

result = pd.merge(left, right, on='k', suffixes=['', '_r'])
print result
    k  v  v_r
0  K0  1    4
1  K0  1    5

Upvotes: 3

Related Questions