Reputation:
I've created a function that takes datframe as an argument. I can get different results from the function by changing the name of the dataframe. I want to get the argument that specifies the df to the function to use as text.
def my_func(df, prop=True):
fd = df.C1.value_counts(normalize=prop).reset_index()
fd.columns = ['feature','proportion']
return fd
# Note: I pass this function within another function that builds a graph using fd
import pandas as pd
df1 = pd.DataFrame({'C1':['A','A','B','D']})
df2 = pd.DataFrame({'C1':['C','C','B','D']})
my_func(df2)
# feature proportion
# 0 C 0.50
# 1 D 0.25
# 2 B 0.25
Desired Functionality
I want to be able to save fd for the dataframes with the names fd_df1 and fd_df2 right within the function my_func, which I can then call globally.
So, I figured that if there was a way to get the text of the arguments passed to a function, then I can use that to create global variables from withn my_func.
Like so:
def my_func(df, prop=True):
fd = df.C1.value_counts(normalize=prop).reset_index()
fd.columns = ['feature','count','proportion']
df_name = get_text_of_arg()[0] # here I want the code that gets
# 'df1' or 'df2', whatever df is used in function
global df_name_fd # create unique name as global variable
df_name_fd = fd # save fd with unique name
return fd
my_func(df2) # returs fd for df1 and saves it with unique name df1_fd
# feature proportion
# 0 C 0.50
# 1 D 0.25
# 2 B 0.25
# Calling the fd for df1
df2_fd
# feature proportion
# 0 C 0.50
# 1 D 0.25
# 2 B 0.25
Upvotes: 0
Views: 427
Reputation: 1257
Let me preface this answer by saying this should not be done. If you want to have access to the results, then maintain a collection of results. First the solution you asked for but should not use (credits to Ivo Wetzel here for the lookup on the attribute names):
import inspect
import functools
import pandas as pd
def return_df_to_globals(prefix):
def _return_df_to_globals(f):
@functools.wraps(f)
def wrapped(df, *args, **kwargs):
frame = inspect.currentframe()
frame = inspect.getouterframes(frame)[1]
string = inspect.getframeinfo(frame[0]).code_context[0].strip()
assignments = string[string.find('(') + 1:-1].split(',')
df_input_name = next(v for k, v in map(lambda a: a.split("="), assignments) if k.strip() == "df")
ret = f(df, *args, **kwargs)
globals()["_".join([prefix, df_input_name])] = ret
return ret
return wrapped
return _return_df_to_globals
@return_df_to_globals(prefix="f")
def my_func(df, prop=True):
fd = df.value_counts(normalize=prop).reset_index()
fd.columns = ['feature','count','proportion']
return fd
df1 = pd.DataFrame({'C1':['A','A','B'], 'C2':[10,20,30]})
df2 = pd.DataFrame({'C1':['C','C','B'], 'C2':[100,200,300]})
my_func(prop=True, df=df1)
f_df1 # exists, with return value of the call.
On your question regarding why this is not advisible:
You have already made one improvement (providing the name explicitly). To not pollute or otherwise endanger the global namespace, here is a suggestion:
import inspect
import functools
def collec_result(f):
# check that function does not use parameter names used by cache
if {"collect_in", "collect_name"}.intersection(inspect.signature(f).parameters.keys()):
raise ValueError("Error: function signature contains parameters collect_in or collect_name.")
@functools.wraps(f)
def fun(*args, **kwargs):
collect_in = kwargs.pop("collect_in", None)
collect_name = kwargs.pop("collect_name", None)
ret = f(*args, **kwargs)
if collect_in is not None and collect_name is not None:
collect_in[collect_name] = ret
return ret
return fun
You can then decorate your functions with collec_result
and use collect_in
with a dictionary (modifications to SimpleNamespace
or similar also possible) and collect_name
using the naming strategy you employed also in your solution whenever you wish to write the result also to a dictionary:
results = {}
@collect_result
def foo(a, b):
return a+b
foo(1, 2,
collect_in=results,
collect_name="123")
results["123"] # 3
Of course, still better would be to just:
ret = foo(1, 2)
results["my_result"] = ret
Which then means that in whatever local scope (rendering all work above for naught) we could just:
my_result = foo(1, 2)
# or as in your case instead of func(df2, 'df2')
df2 = func(df2)
Then command-query-separation is adhered to. You don't need to silently modify the global namespace and are overall far more fault-resilient than otherwise.
Upvotes: 1
Reputation:
Here is a way that takes an extra argument.
def func(df, name=None, prop=True):
fd = df.C1.value_counts(normalize=prop).reset_index()
fd.columns = ['feature','proportion']
if name!=None:
globals()[name+'_fd'] = fd
return fd
import pandas as pd
df1 = pd.DataFrame({'C1':['A','A','B','D']})
df2 = pd.DataFrame({'C1':['C','C','B','D']})
func(df2, 'df2')
# feature proportion
# 0 C 0.50
# 1 B 0.25
# 2 D 0.25
df2_fd
# feature proportion
# 0 C 0.50
# 1 B 0.25
# 2 D 0.25
func(df1) # will not save separately
# feature proportion
# 0 A 0.50
# 1 B 0.25
# 2 D 0.25
df1_fd
# NameError: name 'df1_fd' is not defined
Upvotes: 0