Reputation: 1075
This question has a big chance to be duplicated but I haven't found an answer yet. However, I'm trying to apply a function to a pandas DataFrame and I want to have a DataFrame back. Followed example is reproducible:
df = pd.DataFrame({'ID': ["1","2"],
'Start': datetime.strptime('20160701', '%Y%m%d'),
'End': datetime.strptime('20170701', '%Y%m%d'),
'Value': [100, 200],
'CreditNote': [-20, -30]})
My function:
def act_value_calc(x):
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
result2 = round( (x.Value + x.CreditNote) - result1, 2)
return(pd.DataFrame({'r1': [result1],'r2': [result2]}))
Why I can not run the following code ...
df.apply(act_value_calc, 1)
and what should be done to let it run? I mean to get a DataFrame or a list back with result1
and result2
?
Upvotes: 1
Views: 6087
Reputation: 99
you can create a global variable by declaring it within the function and then create a data frame out of it
def act_value_calc(x):
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
result2 = round( (x.Value + x.CreditNote) - result1, 2)
global df ### declaring global variable
df=pd.DataFrame({'r1': [result1],'r2': [result2]})
Upvotes: 1
Reputation: 6663
You can make it easier for yourself while returning a pandas.Series instead of a pandas.DataFrame:
def act_value_calc(x):
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
result2 = round( (x.Value + x.CreditNote) - result1, 2)
return(pd.Series({'r1': result1,'r2': result2}))
print(df.apply(act_value_calc, 1))
r1 r2
0 40.11 39.89
1 85.23 84.77
Upvotes: 0
Reputation: 7275
apply
will return some value per row, or per column, depending on the axis
argument you provide (I believe you understand this already given you are providing an axis
arg of 1).
Returning a DataFrame from apply is problematic. What you probably want to do is create a new column with the values returned by the function you are applying.
Something like
def act_value_calc1(x):
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result1 = round( (x.Value + x.CreditNote) / full_delta * start_delta, 2)
return result1
def act_value_calc2(x):
start_delta = (x.Start.replace(day=31,month=12) - x.Start).days
full_delta = (x.End - x.Start).days
result2 = round( (x.Value + x.CreditNote) - x.result1, 2)
return result2
df['result1'] = df.apply(act_value_calc1, axis=1)
df['result2'] = df.apply(act_value_calc2, axis=1)
Upvotes: 0