Reputation: 568
I am trying to change the column name of one set of duplicate columns in the dataframe. I form the dataframe from two different csv files after a bunch of pre-processing.
merged_df.columns
is this:
Index(['PortfolioID', 'FirmID', 'PC_City', 'PC_State', 'PC_Country',
'Investment_Date', 'Exit_Date', 'Transaction_Comments', 'PC_City',
'PC_State', 'PC_Country',
'fundSize'],
dtype='object')
The first set of PC_City, PC_State, PC_Country
is for portfolio companies and the next set is for the firms. I tried this but made both the sets same. Then I saw this but it can be done only when importing the csv. This doesn't talk about duplicates.
Is there any way I can achieve this? Mods please help with the editing.
Upvotes: 3
Views: 456
Reputation: 25269
Assume you want to add _Firms
to duplicated values of column
Sample:
df.columns
Out[372]:
Index(['PortfolioID', 'FirmID', 'PC_City', 'PC_State', 'PC_Country',
'Investment_Date', 'Exit_Date', 'Transaction_Comments', 'PC_City',
'PC_State', 'PC_Country', 'fundSize'],
dtype='object')
You may use where
with duplicated
as follows:
cols = df.columns
df.columns = cols.where(~cols.duplicated(), cols + '_Firms')
print(df.columns)
Out[380]:
Index(['PortfolioID', 'FirmID', 'PC_City', 'PC_State', 'PC_Country',
'Investment_Date', 'Exit_Date', 'Transaction_Comments', 'PC_City_Firms',
'PC_State_Firms', 'PC_Country_Firms', 'fundSize'],
dtype='object')
Upvotes: 1
Reputation: 150825
You can try adding the firm
on duplicates:
duplicates = merged_df.columns.duplicated()
merged_df.columns = [x+'_firm' if i else x for x, i in zip(merged_df.columns, duplicates)]
Output:
Index(['PortfolioID', 'FirmID', 'PC_City', 'PC_State', 'PC_Country',
'Investment_Date', 'Exit_Date', 'Transaction_Comments', 'PC_City_firm',
'PC_State_firm', 'PC_Country_firm', 'fundSize'],
dtype='object')
However, from you naming, you should consider passing suffixes=('portfolio', 'firm')
into merge
function that creates merged_df
.
Upvotes: 2