Thomas Cannon
Thomas Cannon

Reputation: 99

How can I mask a pandas dataframe column in logging output?

I am having to log some pandas dataframe outputs that contain sensitive information. I would rather not have this info in the logs or print in the terminal.

I normally write a little function that can take a string and mask it with a regex, but I am having trouble doing that with a dataframe. Is there anyway to mask a column(s) of sensitive info in a data frame just for logging? The method I have tried below changes the dataframe, making the column unusable down the line.

def hide_by_pd_df_columns(dataframe,columns,replacement=None):
    '''hides/replaces a pandas dataframe column with a replacement'''
    for column in columns:
        replacement = '*****' if replacement is None else replacement
        dataframe[column] = replacement
    return dataframe

What I want to happen is the ***** mask to only exist in logging and not in the rest of the operations.

Upvotes: 1

Views: 330

Answers (1)

Ricky Kim
Ricky Kim

Reputation: 2022

Make sure to df.copy the dataframe if you want to leave the original df as is:

def hide_by_pd_df_columns(dataframe,columns,replacement=None):
    '''hides/replaces a pandas dataframe column with a replacement'''
    df=dataframe.copy()
    for column in columns:
        replacement = '*****' if replacement is None else replacement
        df[column] = replacement
    return df

Upvotes: 1

Related Questions