Reputation: 3007
I have a codebase where this pattern is very common:
df # Some pandas dataframe with columns userId, sessionId
def add_session_statistics(df):
df_statistics = get_session_statistics(df.sessionId.unique())
return df.merge(df_statistics, on='sessionId', how='left')
def add_user_statistics(df):
df_statistics = add_user_statistics(df.userId.unique())
return df.merge(df_statistics, on='sessionId', how='left')
# etc..
df_enriched = (df
.pipe(add_session_statistics)
.pipe(add_user_statistics)
)
However, in another part of the codebase I have 'userId', 'sessionId' as the index of the dataframe. Something like:
X = df.set_index(['userId', 'sessionId'])
This means I can't use the add_{somthing}_statistics()
functions on X
without resetting the index each time.
Is there any decorator I can add to the add_{somthing}_statistics()
to make them reset the index if they get a KeyError
when attempting the merge on a column that is not there?
Upvotes: 0
Views: 35
Reputation: 3007
This seems to work:
def index_suspension_on_add(add_function):
def _helper(df):
try:
return df.pipe(add_function)
except Exception:
index_names = df.index.names
return (df
.reset_index()
.pipe(add_function)
.set_index(index_names)
)
return _helper
@index_suspension_on_add
def add_user_statistics(df):
...
Upvotes: 1