Reputation: 2605
I have a df X with columns with duplicate names:
In [77]: X_R
Out[77]:
dollars dollars
0 0.7085 0.5000
I want to rename it so that I have:
In [77]: X_R
Out[77]:
Retail Cost
0 0.7085 0.5000
Using the Pandas rename function does' work:
X_R.rename(index=str, columns={"dollars": "Retail", "dollars": "Cost"})
Just gives me two columns named Cost.
How can I rename the columns in this case?
Upvotes: 25
Views: 55316
Reputation: 1846
For the relatively simple case in the OP's question, Mihkorz's answer works well. However, for dynamic renaming, more is needed. There are some answers that provide dynamic renaming. As noted in a comment by @wjakobw in n49o7's answer,
if you have multiple sets of duplicates they aren't handled independently. a, a, b, b becomes a0, a1, b2, b3.
gbitmmon's answer avoids the shared index issue, as does Benedictanjw's answer. The latter has the code for managing the renames both outside and inside of the rename call. This answer is to provide a similar dynamic renaming solution to the former, but using collections.Counter instead of manually tracking the added elements and their indexes; and adding a dynamically selected separator instead of a hard-coded _
.
import pandas # Yes, I know the custom is to do `import pandas as pd`.
from collections import Counter
class ColumnRenamer:
def __init__(self, separator=None):
self.counter = Counter()
self.sep = separator
def __call__(self, x):
index = self.counter[x] # Counter returns 0 for missing elements
self.counter[x] = index + 1 # Uses something like `setdefault`
return f'{x}{self.sep if self.sep and index else ""}{index if index else ""}'
Using this DataFrame
with multiple duplicates, of varying counts, and not all having duplicates:
df = pandas.DataFrame(
[[0,1,2,4,3,2,1,4,2,1,0,2,3], [4,3,2,3,1,0,1,2,1,0,2,4,1], [1,2,3,2,1,3,0,4,1,2,2,3,1]],
columns=['x', 'x', 'q', 'y', 'y', 'z', 'z', 'x', 'z', 'z', 'z', 'z', 'x'],
)
df
Out[28]:
x x q y y z z x z z z z x
0 0 1 2 4 3 2 1 4 2 1 0 2 3
1 4 3 2 3 1 0 1 2 1 0 2 4 1
2 1 2 3 2 1 3 0 4 1 2 2 3 1
with the above ColumnRenamer
class gives completely unique column names, no shared repeat indexes, and whatever separator you desire, including an empty string:
df.rename(columns=ColumnRenamer(separator='#'))
Out[29]:
x x#1 q y y#1 z z#1 x#2 z#2 z#3 z#4 z#5 x#3
0 0 1 2 4 3 2 1 4 2 1 0 2 3
1 4 3 2 3 1 0 1 2 1 0 2 4 1
2 1 2 3 2 1 3 0 4 1 2 2 3 1
df.rename(columns=ColumnRenamer(separator='_'))
Out[30]:
x x_1 q y y_1 z z_1 x_2 z_2 z_3 z_4 z_5 x_3
0 0 1 2 4 3 2 1 4 2 1 0 2 3
1 4 3 2 3 1 0 1 2 1 0 2 4 1
2 1 2 3 2 1 3 0 4 1 2 2 3 1
df.rename(columns=ColumnRenamer(separator=''))
Out[31]:
x x1 q y y1 z z1 x2 z2 z3 z4 z5 x3
0 0 1 2 4 3 2 1 4 2 1 0 2 3
1 4 3 2 3 1 0 1 2 1 0 2 4 1
2 1 2 3 2 1 3 0 4 1 2 2 3 1
df.rename(columns=ColumnRenamer())
Out[32]:
x x1 q y y1 z z1 x2 z2 z3 z4 z5 x3
0 0 1 2 4 3 2 1 4 2 1 0 2 3
1 4 3 2 3 1 0 1 2 1 0 2 4 1
2 1 2 3 2 1 3 0 4 1 2 2 3 1
Upvotes: 0
Reputation: 546
Not directly an answer, but since this a top search result, here is a short and flexible solution to append a suffix to duplicate column names:
# A dataframe with duplicated column names
df = pd.DataFrame([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
df.columns = ['a', 'b', 'b']
# Columns to not rename
excluded = df.columns[~df.columns.duplicated(keep=False)]
# An incrementer
import itertools
inc = itertools.count().__next__
# A renamer
def ren(name):
return f"{name}{inc()}" if name not in excluded else name
# Use inside rename()
df.rename(columns=ren)
a b b a b0 b1
0 1 2 3 0 1 2 3
1 4 5 6 => 1 4 5 6
2 7 8 8 2 7 8 9
Upvotes: 7
Reputation: 838
MaxU's answer helped me with this same problem. In this answer, I add in a way to find those duplicated column headers.
First, we make a dictionary of the duplicated column names with values corresponding to the desired new column names. For this, the defaultdict subclass is required.
import pandas as pd
from collections import defaultdict
renamer = defaultdict()
We iterate over the duplicated column names to create a dictionary with keys being the duplicated column name and values being a list of new column names. I have chosen this list to be original name_0, original name_1, and so on.
for column_name in df.columns[df.columns.duplicated(keep=False)].tolist():
if column_name not in renamer:
renamer[column_name] = [column_name+'_0']
else:
renamer[column_name].append(column_name +'_'+str(len(renamer[column_name])))
print(renamer)
defaultdict(None, {'b': ['b_0', 'b_1', 'b_2', 'b_3'], 'c': ['c_0', 'c_1']})
Original dataframe:
print(df)
a b b b b c c d
Item 0 2 1 0 2 8 3 9 5
Item 1 3 2 7 3 5 4 6 2
Item 2 4 3 8 1 5 7 4 4
Item 3 5 5 3 6 0 5 2 5
Rename the duplicated columns by assigning the new names from our renamer defaultdict, leaving the unduplicated columns alone
df.rename(
columns=lambda column_name: renamer[column_name].pop(0)
if column_name in renamer
else column_name
)
a b_0 b_1 b_2 b_3 c_0 c_1 d
Item 0 2 1 0 2 8 3 9 5
Item 1 3 2 7 3 5 4 6 2
Item 2 4 3 8 1 5 7 4 4
Item 3 5 5 3 6 0 5 2 5
(As a sidenote, a couple of people have questioned why duplicated column names existed in the first place. For myself, I encountered duplicated column names when importing with the xlwings package (to deal with password-protected Excel files). You could also inadvertently create duplicate column names by using pd.concat
.
Upvotes: 6
Reputation: 4322
Here is another dynamic solution that I think is nicer
In [59]: df
Out[59]:
a x x x z
0 6 2 7 7 8
1 6 6 3 1 1
2 6 6 7 5 6
3 8 3 6 1 8
4 5 7 5 3 0
In [61]: class renamer():
def __init__(self):
self.d = dict()
def __call__(self, x):
if x not in self.d:
self.d[x] = 0
return x
else:
self.d[x] += 1
return "%s_%d" % (x, self.d[x])
df.rename(columns=renamer())
Out[61]:
a x x_1 x_2 z
0 6 2 7 7 8
1 6 6 3 1 1
2 6 6 7 5 6
3 8 3 6 1 8
4 5 7 5 3 0
Upvotes: 25
Reputation: 210842
Here is a dynamic solution:
In [59]: df
Out[59]:
a x x x z
0 6 2 7 7 8
1 6 6 3 1 1
2 6 6 7 5 6
3 8 3 6 1 8
4 5 7 5 3 0
In [60]: d
Out[60]: {'x': ['x1', 'x2', 'x3']}
In [61]: df.rename(columns=lambda c: d[c].pop(0) if c in d.keys() else c)
Out[61]:
a x1 x2 x3 z
0 6 2 7 7 8
1 6 6 3 1 1
2 6 6 7 5 6
3 8 3 6 1 8
4 5 7 5 3 0
Upvotes: 28