jovicbg
jovicbg

Reputation: 1553

Replace all characters at once in Pandas dataframe

I have a mutliple columns with different name format. For example:

df.columns = ['name_column 1 (type1), name-column_2-(type1),...]

I need to replace all characters (except underscore) with underscore. But if there is '-(' , I need just one underscore '_', not two for each special character.

Desired output:

df.columns = ['name_column_1_type1, name_column_2_type1,...]

I have tried with

for element in df.columns:
    re.sub('[^A-Za-z0-9]+', '_', element)
    print element

But nothing happens, just like in a few other attempts.

Thanks in advance

Upvotes: 0

Views: 1377

Answers (3)

jezrael
jezrael

Reputation: 863166

Use replace + strip:

df.columns = df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')

Sample:

df = pd.DataFrame(columns=["'name_column 1 (type1)", 'name-column_2-((type1)'])
print (df.columns.tolist())
["'name_column 1 (type1)", 'name-column_2-((type1)']

df.columns =  df.columns.str.replace('[^A-Za-z0-9]+', '_').str.strip('_')
print (df)
Empty DataFrame
Columns: [name_column_1_type1, name_column_2_type1]
Index: []

print (df.columns.tolist())
['name_column_1_type1', 'name_column_2_type1']

Upvotes: 3

Rakesh
Rakesh

Reputation: 82785

Try:

df.columns = [re.sub('[^A-z0-9]', '_', i).replace(" ", "_").replace("__", "_") for i in df.columns]

Upvotes: 1

mhawke
mhawke

Reputation: 87124

Nothing happens because the result of re.sub is not assigned to anything and is therefore lost. You could use a list comprehension and assign the result back to df.columns:

df.columns = [re.sub('[^A-Za-z0-9]+', '_', element) for element in df.columns]
print df.columns

Still the regex pattern is wrong, but this should get you started.

Upvotes: 1

Related Questions