Reputation: 326
data = {'First_Column': [1,2,3], 'Second_Column': [1,2,3],
'\First\Mid\LAST.Ending': [1,2,3], 'First1\Mid1\LAST1.Ending': [1,2,3]}
df = pd.DataFrame(data)
First_Column Second_Column \First\Mid\LAST.Ending First1\Mid1\LAST1.Ending
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
I want to rename the columns as follows:
First_Column Second_Column LAST LAST1
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
So i tried:
df.columns.str.extract(r'([^\\]+)\.Ending')
0
0 NaN
1 NaN
2 LAST
3 LAST1
and
col = df.columns.tolist()
for i in col[2:]:
print(re.search(r'([^\\]+)\.Ending', i).group())
LAST.Ending
LAST1.Ending
First thing i noticed the different outputs of the regex argument. Why is that? Second, i prefer the version with extract. But how to keep the original name if there is no match?
THX
Upvotes: 2
Views: 113
Reputation: 23099
another method is to use df.filter
to find your target columns then a dict with rename
after using your regex
s = df.filter(like='\\',axis=1).columns
s1 = s.str.extract(r'([^\\]+)\.Ending')[0].tolist()
df.rename(columns=dict(zip(s,s1)))
print(df)
First_Column Second_Column LAST LAST1
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
Upvotes: 2
Reputation: 150765
You can use np.where
to fill where it doesn't match:
s = df.columns.str.extract(r'([^\\]+)\.Ending')[0]
df.columns = np.where(s.isna(), df.columns, s)
# equivalently
# df.columns = s.mask(s.isna(), df.columns.values)
Output:
First_Column Second_Column LAST LAST1
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
Upvotes: 3