farhawa
farhawa

Reputation: 10398

Groupby an numpy.array based on groupby of a pandas.DataFrame with the same length

I have a numpy.array arr and a pandas.DataFrame df.

arr and df have the same shape (x,y).

I need to group by one column of df and apply the transformation of the impacted rows on arr which have the same shape.

To be clear, here is a toy example:

arr = 
   0  1   12   3
   2  5   45   47
   3  19  11  111

df =
   A  B   C   D
0  0  1   2   3
1  4  5   6   7
2  4  9  10  11

I want to group df by A and compute the mean but in place of transforming df I want arr to be transformed.

So I get something like:

    arr = 
        0        1         12          3
       (2+3)/2  (5+19)/2   (45+11)/2   (47+111)/2

Is that possible? With no expensive loops?

Thanks in advance

Upvotes: 2

Views: 2567

Answers (1)

jezrael
jezrael

Reputation: 862781

It looks like need first create DataFrame from arr, then groupby by column A and aggregate mean. Last convert it to numpy array by values:

print (pd.DataFrame(arr).groupby(df.A).mean().values)
[[  0.    1.   12.    3. ]
 [  2.5  12.   28.   79. ]]

Upvotes: 2

Related Questions