user9329768
user9329768

Reputation:

Python - Get text of arguments passed to a function and use it to assign global variables

I've created a function that takes datframe as an argument. I can get different results from the function by changing the name of the dataframe. I want to get the argument that specifies the df to the function to use as text.

def my_func(df, prop=True):
    fd = df.C1.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','proportion']
    return fd
# Note: I pass this function within another function that builds a graph using fd

import pandas as pd
df1 = pd.DataFrame({'C1':['A','A','B','D']})
df2 = pd.DataFrame({'C1':['C','C','B','D']})

my_func(df2)
#   feature  proportion
# 0       C        0.50
# 1       D        0.25
# 2       B        0.25

Desired Functionality
I want to be able to save fd for the dataframes with the names fd_df1 and fd_df2 right within the function my_func, which I can then call globally. So, I figured that if there was a way to get the text of the arguments passed to a function, then I can use that to create global variables from withn my_func. Like so:

def my_func(df, prop=True):
    fd = df.C1.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','count','proportion']
    df_name = get_text_of_arg()[0]       # here I want the code that gets 
                                         # 'df1' or 'df2', whatever df is used in function
    global df_name_fd          # create unique name as global variable
    df_name_fd = fd            # save fd with unique name
    return fd
my_func(df2)     # returs fd for df1 and saves it with unique name df1_fd
#   feature  proportion
# 0       C        0.50
# 1       D        0.25
# 2       B        0.25
# Calling the fd for df1
df2_fd
#   feature  proportion
# 0       C        0.50
# 1       D        0.25
# 2       B        0.25

Upvotes: 0

Views: 427

Answers (2)

sim
sim

Reputation: 1257

Let me preface this answer by saying this should not be done. If you want to have access to the results, then maintain a collection of results. First the solution you asked for but should not use (credits to Ivo Wetzel here for the lookup on the attribute names):

import inspect
import functools
import pandas as pd


def return_df_to_globals(prefix):
    def _return_df_to_globals(f):
        @functools.wraps(f)
        def wrapped(df, *args, **kwargs):
            frame = inspect.currentframe()
            frame = inspect.getouterframes(frame)[1]
            string = inspect.getframeinfo(frame[0]).code_context[0].strip()
            assignments = string[string.find('(') + 1:-1].split(',')
            df_input_name = next(v for k, v in map(lambda a: a.split("="), assignments) if k.strip() == "df")
            ret = f(df, *args, **kwargs)
            globals()["_".join([prefix, df_input_name])] = ret
            return ret
        return wrapped
    return _return_df_to_globals

@return_df_to_globals(prefix="f")
def my_func(df, prop=True):
    fd = df.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','count','proportion']
    return fd

df1 = pd.DataFrame({'C1':['A','A','B'], 'C2':[10,20,30]})
df2 = pd.DataFrame({'C1':['C','C','B'], 'C2':[100,200,300]})
my_func(prop=True, df=df1)
f_df1  # exists, with return value of the call.

On your question regarding why this is not advisible:

  • To get to the argument names as needed, you need to inspect frame information from the interpreter stack. It is not intended for such uses and I am sure there will be corner-cases that break above example (maybe somebody else can elaborate).
  • Separating commands and queries (see command query separation) is generally considered good style and avoids unwanted misconceptions about the system state. Your function both has side-effects (it adds to the global namespace) and returns the results of a query. Fowler's article mentions also valid exceptions to the principle - a cache might also be another good one.
  • Along the lines of the last point: You could very easily override a name in the global namespace.

You have already made one improvement (providing the name explicitly). To not pollute or otherwise endanger the global namespace, here is a suggestion:

import inspect
import functools

def collec_result(f):
    # check that function does not use parameter names used by cache
    if {"collect_in", "collect_name"}.intersection(inspect.signature(f).parameters.keys()):
        raise ValueError("Error: function signature contains parameters collect_in or collect_name.")
        
    @functools.wraps(f)
    def fun(*args, **kwargs):
        collect_in = kwargs.pop("collect_in", None)
        collect_name = kwargs.pop("collect_name", None)
        ret = f(*args, **kwargs)
        if collect_in is not None and collect_name is not None:
            collect_in[collect_name] = ret
        return ret
    return fun

You can then decorate your functions with collec_result and use collect_in with a dictionary (modifications to SimpleNamespace or similar also possible) and collect_name using the naming strategy you employed also in your solution whenever you wish to write the result also to a dictionary:

results = {}

@collect_result
def foo(a, b):
    return a+b

foo(1, 2, 
    collect_in=results,
    collect_name="123")

results["123"]  # 3

Of course, still better would be to just:

ret = foo(1, 2)
results["my_result"] = ret

Which then means that in whatever local scope (rendering all work above for naught) we could just:

my_result = foo(1, 2)
# or as in your case instead of func(df2, 'df2')
df2 = func(df2)

Then command-query-separation is adhered to. You don't need to silently modify the global namespace and are overall far more fault-resilient than otherwise.

Upvotes: 1

user9329768
user9329768

Reputation:

Here is a way that takes an extra argument.

def func(df, name=None, prop=True):
    fd = df.C1.value_counts(normalize=prop).reset_index()
    fd.columns = ['feature','proportion']
    if name!=None:
        globals()[name+'_fd'] = fd
    return fd

import pandas as pd
df1 = pd.DataFrame({'C1':['A','A','B','D']})
df2 = pd.DataFrame({'C1':['C','C','B','D']})

func(df2, 'df2')
#   feature  proportion 
# 0       C        0.50 
# 1       B        0.25 
# 2       D        0.25

df2_fd
#   feature  proportion 
# 0       C        0.50 
# 1       B        0.25 
# 2       D        0.25

func(df1)  # will not save separately
#   feature  proportion
# 0       A        0.50
# 1       B        0.25
# 2       D        0.25

df1_fd
# NameError: name 'df1_fd' is not defined

Upvotes: 0

Related Questions