Reputation: 69
I am trying to rename my columns with the previous column name with a number on the end to make the columns unique. Is there a way to achieve this?
CurrentDF:
Reconnaissance Unnamed: 1 Resource Development Unnamed: 3 Initial Access Unnamed: 5
Active Scanning Scanning IP Blocks Acquire Infrastructure Botnet Drive-by Compromise NaN
Desired:
Reconnaissance Reconnaissance_1 Resource Development Resource Development_1 Initial Access Initial Access_1
Active Scanning Scanning IP Blocks Acquire Infrastructure Botnet Drive-by Compromise NaN
Upvotes: 3
Views: 543
Reputation: 118
To rename all columns at once you can do the following:
df.columns = [col1, col2, col3]
Upvotes: 0
Reputation: 59579
You can create a Series from the columns (because Index
objects have no ffill
method, which is useful here). Then determine which columns start with Unnamed
, mask them and use a cumcount to figure out what number to add onto the end (in the case of possibly multiple consecutive Unnamed: columns) and use ffill
to get the previous column label that didn't start with 'Unnamed'. Assign the columns with this Series.
import pandas as pd
import numpy as np
df = pd.DataFrame(columns=['Reconnaissance', 'Unnamed: 1', 'Resource Development',
'Unnamed: 3', 'Initial Access', 'Unnamed: 5'],
data=1, index=[0])
s = pd.Series(df.columns)
s = s.mask(s.str.startswith('Unnamed:'))
s = (s.ffill()
+ s.groupby(s.notnull().cumsum()).cumcount().astype(str).radd('_').replace('_0', ''))
df.columns = s
print(df)
Reconnaissance Reconnaissance_1 Resource Development Resource Development_1 Initial Access Initial Access_1
0 1 1 1 1 1 1
And here's another example to show how this behaves with less regularly spaced 'Unnamed:' columns.
df = pd.DataFrame(columns=['a', 'Unnamed: 1', 'Unnamed: 2', 'b', 'c', 'Unnamed: 3'],
data=[np.arange(6)], index=[0])
#### Same code as above
print(df)
a a_1 a_2 b c c_1
0 0 1 2 3 4 5
Upvotes: 3
Reputation: 23166
If it's every second column that needs renaming, you can use:
df = df.rename(columns = {df.columns[i]: f"{df.columns[i-1]}_1" for i in range(1, len(df.columns),2)})
Upvotes: 4