Reputation: 67
I have a df in which I need to rename 40 column names to empty string. this can be achieved by using .rename()
, but I need to provide all the column names in dict, which needs to be renamed. I was searching for some better way to rename columns by some pattern matching. wherever it finds NULL/UNNAMED in column name, replace that with empty string.
df1: original df (In actual df, i have around 20 columns as NULL1-NULL20 and 20 columns as UNNAMED1-UNNAMED20)
NULL1 NULL2 C1 C2 UNNAMED1 UNNAMED2
0 1 11 21 31 41 51
1 2 22 22 32 42 52
2 3 33 23 33 43 53
3 4 44 24 34 44 54
desired output df:
C1 C2
0 1 11 21 31 41 51
1 2 22 22 32 42 52
2 3 33 23 33 43 53
3 4 44 24 34 44 54
This can be achieved by
df.rename(columns={'NULL1':'', 'NULL2':'', 'UNNAMED1':'', 'UNNAMED2':''}, inplace=True)
But I dont want to create the long dictionary of 40 elements
Upvotes: 4
Views: 17580
Reputation: 495
You can use dict comprehension inside df.rename():
idx_filter = np.asarray([i for i, col in enumerate(df.columns) if SOME_STRING_CONDITION in col])
df.rename(columns={col: '' for col in df.columns[idx_filter]}, inplace=True)
In your case, it sounds like SOME_STRING_CONDITION would be 'NULL' or 'UNNAMED'.
I figured this out while looking for help on a thread for a more generic column renaming issue (Renaming columns in pandas) for a problem of my own. I didn't have enough reputation to add my solution as an answer or comment (I'm new-ish on stackoverflow), so I am posting it here!
This solution is also helpful if you need to keep part of the string that you were filtering for. For example, if you wanted to replace the "C" columns with "col_":
idx_filter = np.asarray([i for i, col in enumerate(df.columns) if 'C' in col])
df.rename(columns={col: col.replace('C', 'col_') for col in df.columns[idx_filter]}, inplace=True)
Upvotes: 1
Reputation: 87
If you want to stick with rename
:
def renaming_fun(x):
if "NULL" in x or "UNNAMED" in x:
return "" # or None
return x
df = df.rename(columns=renaming_fun)
It can be handy if the mapping function gets more complex. Otherwise, list comprehensions will do:
df.columns = [renaming_fun(col) for col in cols]
Another possibility:
df.columns = map(renaming_fun, df.columns)
But as it was already mentioned, renaming with empty strings is not something you would usually do.
Upvotes: 4
Reputation: 1
df.columns = [col if “NULL” not in col else “” for col in df.columns]
This should work, since you can change the column names by assinging list to the dataframe column variable.
Upvotes: 0
Reputation: 862481
Is it possible, but be carefull - then if need select one empty column get all empty columns, because duplicated columns names:
print (df[''])
0 1 11 41 51
1 2 22 42 52
2 3 33 43 53
3 4 44 44 54
Use startswith
for get all columns by tuples in list comprehension:
df.columns = ['' if c.startswith(('NULL','UNNAMED')) else c for c in df.columns]
Your solution should be changed:
d = dict.fromkeys(df.columns[df.columns.str.startswith(('NULL','UNNAMED'))], '')
print (d)
{'NULL1': '', 'NULL2': '', 'UNNAMED1': '', 'UNNAMED2': ''}
df = df.rename(columns=d)
Upvotes: 1
Reputation: 13401
If you have few columns to retain their name. Use list-comprehension
as below:
df.columns = [col if col in ('C1','C2') else "" for col in df.columns]
Upvotes: 0