Reputation: 591
In Pandas when you define a dataframe and merge it to other dataframes on and on, the column names will be replaced with either a the exact name or _x or _y
etc depending on how many you merged. This becomes a pain when you find that you had to implement something in your earlier code and you go back to change and merge a an extra dataframe. Now this then renames the exact name or _x or _y
and now you have to replace the columns that come afterwards with the newly created names.
Is there a way around this? Do i need to use a table type of data-structure like hdf5 etc?
Upvotes: 2
Views: 1428
Reputation: 863226
Maybe help set parameter suffixes
in merge
:
import pandas as pd
left = pd.DataFrame({'k': ['K0', 'K1', 'K2'], 'v': [1, 2, 3]})
print left
k v
0 K0 1
1 K1 2
2 K2 3
right = pd.DataFrame({'k': ['K0', 'K0', 'K3'], 'v': [4, 5, 6]})
print right
k v
0 K0 4
1 K0 5
2 K3 6
result = pd.merge(left, right, on='k', suffixes=['', '_r'])
print result
k v v_r
0 K0 1 4
1 K0 1 5
Upvotes: 3