Reputation: 681
I have a function that returns a oneliner pd.DataFrame. I have wrapped that function in a loop and want to aggregate the results based on an index.
def func(input):
some calculation
return oneliner
oneliner
looks like this
date return return_lev
20100101 0.05 0.0725
My loop produces now several of these oneliners and I would like to aggregate all the oneliners if they have the same date otherwise just append the onliner:
df = []
for x in range(0, 10)
res = func(x)
df.append(res).groupby(by = 'date').sum()
However, this tells me:
AttributeError: 'NoneType' object has no attribute 'groupby'
Even if I take out groupby
, I get the error:
AttributeError: 'NoneType' object has no attribute 'sum'
Any idea, how I could solve this?
edit: here we go, I have a function that produces random numbers as a oneliner, similar to my results.
df_date = pd.DataFrame(['20100101', '20100102',
'20100103', '20100104', '20100105'], columns = ['date'])
from random import randint
def test_func(i):
a = randint(0, 9) + i
b = randint(0, 9) / 10 + i
c = randint(0, 9) + i
d = randint(0, 9) / 10 + i
datetime = df_date.sample(1)
a_s = pd.Series(a, dtype = int)
b_s = pd.Series(b, dtype = float)
c_s = pd.Series(c, dtype = int)
d_s = pd.Series(d, dtype = float)
overview = pd.DataFrame(np.concatenate([a_s, b_s, c_s, d_s]).reshape(1, 4),
columns = ['a', 'b', 'c', 'd'], index = datetime)
return overview
now with my previous attempt:
dfs_test = []
for x in range(5):
test_results = test_func(x)
dfs_test.append(test_results).groupby(by = 'datetime').sum()
this gives me as above
AttributeError: 'NoneType' object has no attribute 'groupby'
now with the other version, where I produce an array / list:
from random import randint
def test_func_2(i):
a = randint(0, 9) + i
b = randint(0, 9) / 10 + i
c = randint(0, 9) + i
d = randint(0, 9) / 10 + i
datetime = df_date.sample(1)
a_s = pd.Series(a, dtype = int)
b_s = pd.Series(b, dtype = float)
c_s = pd.Series(c, dtype = int)
d_s = pd.Series(d, dtype = float)
overview = [datetime, a_s, b_s, c_s, d_s]
return overview
and now with the list version:
dfs_test_2 = pd.DataFrame([test_func_2(z) for z in range(5)],
columns=['datetime', 'a', 'b', 'c', 'd'])
dfs_test_2 = dfs_test_2.groupby('datetime').sum().reset_index()
Upvotes: 3
Views: 381
Reputation: 164623
Your idea of returning a list of dataframes and then appending them or adding a result via a loop is inefficient.
Instead, I advise you output a list of lists and then build your dataframe in one step.
def func(var):
"""Return list of [date, return, return_lev]"""
# some calculation
return [a, b, c]
# build dataframe
df = pd.DataFrame([func(x) for x in range(10)],
columns=['date', 'return', 'return_lev'])
# perform groupby
df = df.groupby('date').sum().reset_index()
Update: Your function to return a list of scalars actually returns a list of pd.Series
objects.
Try something like the below:
def test_func_2(i):
a = randint(0, 9) + i
b = randint(0, 9) / 10 + i
c = randint(0, 9) + i
d = randint(0, 9) / 10 + i
datetime = df_date.sample(1).values[0][0]
overview = [datetime, a, b, c, d]
return overview
Upvotes: 3