Reputation: 17631

Pivot a pandas DataFrame to be the correct format: `DataError: No numeric types to aggregate`

Here is a pandas DataFrame I would like to manipulate:

import pandas as pd

data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2", ...],
        "labels": ["A", "B", "C", "A", "B", "C", "D", ...],
        "count": [5, 1, 8, 3, 731, 189, 9, ...]}

df = pd.DataFrame(data)

print(df)
>>>   grouping            labels       count
0        item1             A            5
1        item1             B            1
2        item1             C            8
3        item2             A            3
4        item2             B          731
5        item2             C          189
6        item2             D            9
7        ...               ...         ....

I would like to "unfold" this dataframe into the following format:

grouping    A    B    C    D
item1       5    1    8    3
item2       3    731  189  9
....        ........

How would one do this? I would think that this would work:

pd.pivot_table(df,index=["grouping", "labels"]

but I get the following error:

DataError: No numeric types to aggregate

Upvotes: 3

Answers (4)

piRSquared

Reputation: 294258

There are four idiomatic pandas ways to do this.

No duplicates among grouping columns. Does not require aggregation
- pivot
- set_index
Duplicates among grouping columns. Does require aggregation
- pivot_table
- groupby

pivot

df.pivot('grouping', 'labels', 'count')

set_index

df.set_index(['grouping', 'labels'])['count'].unstack()

pivot_table

df.pivot_table('count', 'grouping', 'labels')

groupby

df.groupby(['grouping', 'labels'])['count'].sum().unstack()

All yield

labels      A      B      C    D
grouping                        
item1     5.0    1.0    8.0  NaN
item2     3.0  731.0  189.0  9.0

timing

With the groupby, set_index, or pivot_table approach, you can easily fill in missing values with fill_value=0

df.pivot_table('count', 'grouping', 'labels', fill_value=0)

df.groupby(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)

df.set_index(['grouping', 'labels'])['count'].sum().unstack(fill_value=0)

All yield

labels    A    B    C  D
grouping                
item1     5    1    8  0
item2     3  731  189  9

Additional thoughts on groupby

Because we don't require any aggregation. If we wanted to use groupby, we can minimize the impact of the implicit aggregation by utilizing a less impactful aggregator.

df.groupby(['grouping', 'labels'])['count'].max().unstack()

df.groupby(['grouping', 'labels'])['count'].first().unstack()

timing groupby

Upvotes: 5

Scott Boston

Reputation: 153460

Use set_index and unstack:

df = df.set_index(['grouping','labels']).unstack().rename_axis(None)
df.columns = df.columns.droplevel()
print(df)

Output:

labels  A    B    C     D
item1   5    1    8  None
item2   3  731  189     9

Upvotes: 6

greole

Reputation: 4771

Try:

In [1]: import pandas as pd
   ...: 
   ...: data = {"grouping": ["item1", "item1", "item1", "item2", "item2", "item2", "item2"],
   ...:         "labels": ["A", "B", "C", "A", "B", "C", "D"],
   ...:         "count": [5, 1, 8, 3, 731, 189, 9]}
   ...: 
In [2]: df = pd.DataFrame(data)
In [3]: df.pivot_table(index="grouping",columns="labels")

Out[3]: 
             count              
    labels       A    B    C   D
    grouping                    
    item1        5    1    8 NaN
    item2        3  731  189   9

Upvotes: 2

BrenBarn

Reputation: 251383

You put labels in the index, but you want it in the columns:

>>> df.pivot_table(index='grouping', columns='labels')
         count                   
labels       A      B      C    D
grouping                         
item1      5.0    1.0    8.0  NaN
item2      3.0  731.0  189.0  9.0

Note that this makes the columns a MultiIndex. If you don't want that, explicitly pass values: df.pivot_table(index='grouping', columns='labels', values='count').

Also, note that the kind of reshape you seem to be looking for will only be possible if each combination of grouping and label has exactly one or zero values. If any combination occurs more than once, you need to decide how to aggregate them (e.g., by summing the matching values).

Upvotes: 3

Pivot a pandas DataFrame to be the correct format: `DataError: No numeric types to aggregate`

Answers (4)

Related Questions