eternity1
eternity1

Reputation: 681

Sum elements in a list by index within a loop

I have a function that returns a oneliner pd.DataFrame. I have wrapped that function in a loop and want to aggregate the results based on an index.

def func(input):
     some calculation
     return oneliner

oneliner looks like this

date         return   return_lev
20100101      0.05      0.0725

My loop produces now several of these oneliners and I would like to aggregate all the oneliners if they have the same date otherwise just append the onliner:

df = []
for x in range(0, 10)
    res = func(x)
    df.append(res).groupby(by = 'date').sum()

However, this tells me:

AttributeError: 'NoneType' object has no attribute 'groupby'

Even if I take out groupby, I get the error:

AttributeError: 'NoneType' object has no attribute 'sum'

Any idea, how I could solve this?

edit: here we go, I have a function that produces random numbers as a oneliner, similar to my results.

df_date = pd.DataFrame(['20100101', '20100102', 
                    '20100103', '20100104', '20100105'], columns = ['date'])

from random import randint

def test_func(i):
    a = randint(0, 9) + i
    b = randint(0, 9) / 10 + i
    c = randint(0, 9) + i
    d = randint(0, 9) / 10 + i
    datetime = df_date.sample(1)

    a_s = pd.Series(a, dtype = int)
    b_s = pd.Series(b, dtype = float)
    c_s = pd.Series(c, dtype = int)
    d_s = pd.Series(d, dtype = float)

    overview = pd.DataFrame(np.concatenate([a_s, b_s, c_s, d_s]).reshape(1, 4), 
                            columns = ['a', 'b', 'c', 'd'], index = datetime)

    return overview

now with my previous attempt:

dfs_test = []

for x in range(5):
    test_results = test_func(x)
    dfs_test.append(test_results).groupby(by = 'datetime').sum()

this gives me as above

AttributeError: 'NoneType' object has no attribute 'groupby'

now with the other version, where I produce an array / list:

from random import randint

def test_func_2(i):
    a = randint(0, 9) + i
    b = randint(0, 9) / 10 + i
    c = randint(0, 9) + i
    d = randint(0, 9) / 10 + i
    datetime = df_date.sample(1)

    a_s = pd.Series(a, dtype = int)
    b_s = pd.Series(b, dtype = float)
    c_s = pd.Series(c, dtype = int)
    d_s = pd.Series(d, dtype = float)

    overview = [datetime, a_s, b_s, c_s, d_s]

    return overview

and now with the list version:

dfs_test_2 = pd.DataFrame([test_func_2(z) for z in range(5)],
                  columns=['datetime', 'a', 'b', 'c', 'd'])

dfs_test_2 = dfs_test_2.groupby('datetime').sum().reset_index()

Upvotes: 3

Views: 381

Answers (1)

jpp
jpp

Reputation: 164623

Your idea of returning a list of dataframes and then appending them or adding a result via a loop is inefficient.

Instead, I advise you output a list of lists and then build your dataframe in one step.

def func(var):
    """Return list of [date, return, return_lev]"""
    # some calculation
    return [a, b, c]

# build dataframe
df = pd.DataFrame([func(x) for x in range(10)],
                  columns=['date', 'return', 'return_lev'])

# perform groupby
df = df.groupby('date').sum().reset_index()

Update: Your function to return a list of scalars actually returns a list of pd.Series objects.

Try something like the below:

def test_func_2(i):
    a = randint(0, 9) + i
    b = randint(0, 9) / 10 + i
    c = randint(0, 9) + i
    d = randint(0, 9) / 10 + i
    datetime = df_date.sample(1).values[0][0]

    overview = [datetime, a, b, c, d]

    return overview

Upvotes: 3

Related Questions