Reputation: 1553
I have a mutliple columns with different name format. For example:
df.columns = ['name_column 1 (type1), name-column_2-(type1),...]
I need to replace all characters (except underscore) with underscore. But if there is '-(' , I need just one underscore '_', not two for each special character.
Desired output:
df.columns = ['name_column_1_type1, name_column_2_type1,...]
I have tried with
for element in df.columns:
re.sub('[^A-Za-z0-9]+', '_', element)
print element
But nothing happens, just like in a few other attempts.
Thanks in advance
Upvotes: 0
Views: 1377
Reputation: 863166
df.columns = df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')
Sample:
df = pd.DataFrame(columns=["'name_column 1 (type1)", 'name-column_2-((type1)'])
print (df.columns.tolist())
["'name_column 1 (type1)", 'name-column_2-((type1)']
df.columns = df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')
print (df)
Empty DataFrame
Columns: [name_column_1_type1, name_column_2_type1]
Index: []
print (df.columns.tolist())
['name_column_1_type1', 'name_column_2_type1']
Upvotes: 3
Reputation: 82785
Try:
df.columns = [re.sub('[^A-z0-9]', '_', i).replace(" ", "_").replace("__", "_") for i in df.columns]
Upvotes: 1
Reputation: 87124
Nothing happens because the result of re.sub
is not assigned to anything and is therefore lost. You could use a list comprehension and assign the result back to df.columns
:
df.columns = [re.sub('[^A-Za-z0-9]+', '_', element) for element in df.columns]
print df.columns
Still the regex pattern is wrong, but this should get you started.
Upvotes: 1