Reputation: 21
I have the following problem, I managed to create a data frame with object dtypes on some columns. In particular these would be 2d numpy arrays but they could be any non-numeric type. Now I want to pivot my dataframe. Is there a way to pass an aggregating function of my choice which works on these objects? I don't seem to be able to do it and I get the error:
GroupByError: No numeric types to aggregate
For example, say I have this dummy data frame:
date foo bar mat
1 a x [[1, 2], [3, 4]]
1 b x [[1, 2], [3, 4]]
1 a y [[1, 2], [3, 4]]
1 b y [[1, 2], [3, 4]]
2 a x [[4, 5], [6, 7]]
2 b x [[4, 5], [6, 7]]
2 a y [[4, 5], [6, 7]]
2 b y [[4, 5], [6, 7]]
and I want to have a new data frame of the type:
dd.pivot_table(values=['mat'], rows=['date'], cols=['foo'], aggfunc= ??)
where my 2-d arrays will be an element-by-element sum of the arrays with same value in the 'foo' columns. How can I do that? If not possible, is it possible to pick the first occurrence of the 'mat' element in the list of arrays with same 'foo'? Thanks
added the desired output:
date a b
1 [[2, 4], [6, 8]] [[8, 10], [12, 14]]
2 [[2, 4], [6, 8]] [[8, 10], [12, 14]]
Upvotes: 1
Views: 1864
Reputation: 117626
You can group first and then pivot:
>>> grouped = df.groupby(('foo', 'date'))
>>> g = grouped['mat'].apply(lambda x: np.array(map(np.array, x.values)).T.sum(axis=2).T).reset_index()
>>> g
foo date 0
0 a 1 [[2, 4], [6, 8]]
1 a 2 [[8, 10], [12, 14]]
2 b 1 [[2, 4], [6, 8]]
3 b 2 [[8, 10], [12, 14]]
>>> g.pivot(columns='foo', values=0, index='date').reset_index()
foo date a b
0 1 [[2, 4], [6, 8]] [[2, 4], [6, 8]]
1 2 [[8, 10], [12, 14]] [[8, 10], [12, 14]]
To sum by elements I've used numpy sum over axis=2
(converted lists into np.array
beforehand). Also, seeems that your output is a bit incorrect - it should be:
date a b
1 [[2, 4], [6, 8]] [[8, 10], [12, 14]]
2 [[2, 4], [6, 8]] [[8, 10], [12, 14]]
Upvotes: 1