Aayush Shah
Aayush Shah

Reputation: 520

Grouping with list - in df.groupby seems not working

I am trying to use a list to group the rows as one of the ways of grouping in pandas.

The objective:

I want to group N number of rows from the data frame - so I took the approach in which groupby takes list as an input and groups rows in that order. Before going through the problem, let me show you the code that I am using to group rows.

import math

df = pd.DataFrame(np.random.randint(0, 100, (100, 5)))

# Number or rows in group
n_elems = 20

# Total rows in the dataset
n_rows = df.shape[0]

# Groups to be created (Taking ceil to deal with even / odd number of rows)
n_groups = math.ceil(n_rows / n_elems)

groups = []
for idx in range(n_groups):
    grp = [idx] * n_elems
    groups.extend(grp)
    
# Making the same length - as groupby requires
groups = groups[:n_rows]

# Using list ↓ to group by
df.groupby(groups).agg(['mean', 'count'])

The problem:

Now, in this case - the algorithm works fine when I take number of rows per groups from 1 to 19. Making 100 groups if n_rows is 1, 50 groups if n_rows is 2, 20 groups if n_rows is 5 and likewise till 19.

But the problem appears from number 20. I don't know why 20, it could be other number based on the other length of rows, but here giving n_rows as 20, it should return 5 groups involving 20 rows in each. But it returns weird looking dataframe with 100 rows but 0 columns!


I tried to look up but didn't find anything useful. Any help would make my understanding of groupby better.

Thanks in advance.

Upvotes: 0

Views: 374

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35686

Try creating groups by floor dividing the index instead:

n_elems = 2
new_df = df.groupby(df.index // n_elems).agg(['mean', 'sum'])
      0          1          2     
   mean  sum  mean  sum  mean  sum
0  57.5  115  75.5  151  34.5   69
1  71.0  142  17.0   34  53.0  106
2  21.0   42  48.5   97  78.5  157

Sample DF Used:

import numpy as np
import pandas as pd

np.random.seed(5)
df = pd.DataFrame(np.random.randint(0, 100, (6, 3)))

df:

    0   1   2
0  99  78  61
1  16  73   8
2  62  27  30
3  80   7  76
4  15  53  80
5  27  44  77

Upvotes: 1

Related Questions