Reputation: 99
I am having to log some pandas dataframe outputs that contain sensitive information. I would rather not have this info in the logs or print in the terminal.
I normally write a little function that can take a string and mask it with a regex, but I am having trouble doing that with a dataframe. Is there anyway to mask a column(s) of sensitive info in a data frame just for logging? The method I have tried below changes the dataframe, making the column unusable down the line.
def hide_by_pd_df_columns(dataframe,columns,replacement=None):
'''hides/replaces a pandas dataframe column with a replacement'''
for column in columns:
replacement = '*****' if replacement is None else replacement
dataframe[column] = replacement
return dataframe
What I want to happen is the ***** mask to only exist in logging and not in the rest of the operations.
Upvotes: 1
Views: 330
Reputation: 2022
Make sure to df.copy the dataframe if you want to leave the original df as is:
def hide_by_pd_df_columns(dataframe,columns,replacement=None):
'''hides/replaces a pandas dataframe column with a replacement'''
df=dataframe.copy()
for column in columns:
replacement = '*****' if replacement is None else replacement
df[column] = replacement
return df
Upvotes: 1