Codutie
Codutie

Reputation: 1075

python: how to return a DataFrame or a list from a function?

This question has a big chance to be duplicated but I haven't found an answer yet. However, I'm trying to apply a function to a pandas DataFrame and I want to have a DataFrame back. Followed example is reproducible:

df = pd.DataFrame({'ID': ["1","2"],
                   'Start': datetime.strptime('20160701', '%Y%m%d'),
                   'End': datetime.strptime('20170701', '%Y%m%d'),
                   'Value': [100, 200],
                   'CreditNote': [-20, -30]})

My function:

def act_value_calc(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
    result2 = round( (x.Value + x.CreditNote) - result1, 2)
    return(pd.DataFrame({'r1': [result1],'r2': [result2]}))

Why I can not run the following code ...

df.apply(act_value_calc, 1)

and what should be done to let it run? I mean to get a DataFrame or a list back with result1 and result2?

Upvotes: 1

Views: 6087

Answers (3)

Anupam khare
Anupam khare

Reputation: 99

you can create a global variable by declaring it within the function and then create a data frame out of it

def act_value_calc(x): 
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
result2 = round( (x.Value + x.CreditNote) - result1, 2)
global  df ### declaring global variable
df=pd.DataFrame({'r1': [result1],'r2': [result2]})

Upvotes: 1

pansen
pansen

Reputation: 6663

You can make it easier for yourself while returning a pandas.Series instead of a pandas.DataFrame:

def act_value_calc(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
    result2 = round( (x.Value + x.CreditNote) - result1, 2)
    return(pd.Series({'r1': result1,'r2': result2}))

print(df.apply(act_value_calc, 1))
    r1      r2
0   40.11   39.89
1   85.23   84.77

Upvotes: 0

conner.xyz
conner.xyz

Reputation: 7275

apply will return some value per row, or per column, depending on the axis argument you provide (I believe you understand this already given you are providing an axis arg of 1).

Returning a DataFrame from apply is problematic. What you probably want to do is create a new column with the values returned by the function you are applying.

Something like

def act_value_calc1(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
    return result1

def act_value_calc2(x):
    start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
    full_delta = (x.End - x.Start).days
    result2 = round( (x.Value + x.CreditNote) - x.result1, 2)
    return result2

df['result1'] = df.apply(act_value_calc1, axis=1)
df['result2'] = df.apply(act_value_calc2, axis=1)

Upvotes: 0

Related Questions