Karn Kumar
Karn Kumar

Reputation: 8816

pandas pivot to transform DataFrame

I'm Just trying to play will pivot to transform the data Frame in a way i desired which pivot does but here its not working.

Any expert insight please.

DataFrame:

>>> df1
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  2225  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

While running pivot i see the below error..

>>> df1.pivot('id', 'item')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4359, in pivot
    return pivot(self, index=index, columns=columns, values=values)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 381, in pivot
    return indexed.unstack(columns)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4546, in unstack
    return unstack(self, level, fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 469, in unstack
    return _unstack_frame(obj, level, fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 488, in _unstack_frame
    fill_value=fill_value)
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
    self._make_selectors()
  File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 154, in _make_selectors
    raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape

Even pivot_table producing errors..

>>> df1.pivot_table(columns='item', values='value')

I tried to github link reference but didn't get it.

Desired should be like:

      Value

item  prize  stock  unit
    id

2225  1.5  10  2.4  lt

Upvotes: 2

Views: 122

Answers (4)

jezrael
jezrael

Reputation: 862511

Problem of data is there are duplicates and also mixed numeric with strings in column value.

General solution - if need for numeric values mean and for duplicates strings join:

def f(x):
    y = pd.to_numeric(x, errors='coerce')
    if y.isna().all():
        return ', '.join(x)
    else:
        return  y.mean()

df = df1.pivot_table(index='id',columns='item', values='value', aggfunc=f)
print (df)
item prize stock unit
id                   
1401   4.3    10  NaN
2225  1.95   NaN   kg
8187   NaN   NaN   lt

Another idea is aggregate numeric and non numeric separate and then concat together:

df1['value1'] = pd.to_numeric(df1['value'], errors='coerce')

df2 = df1.pivot_table(index='id',columns='item', values='value1', aggfunc='mean')

df3 = df1[df1['value1'].isna()]
            .pivot_table(index='id',columns='item', values='value', aggfunc=','.join)

df = pd.concat([df2, df3], axis=1)
print (df)

item  prize  stock unit
id                     
1401   4.30   10.0  NaN
2225   1.95    NaN   kg
8187    NaN    NaN   lt

Upvotes: 2

Deven
Deven

Reputation: 741

you should try the following:

pd.pivot_table(df1, values='value', index=['id'], columns=['item'], aggfunc=np.sum)

And as mentioned below by jezrael, aggfunc can be mean for numeric values and for strings can be join.

Upvotes: 3

Gr&#233;goire Roussel
Gr&#233;goire Roussel

Reputation: 947

Pandas is complaining about the fact that you have the entry (2225, prize) twice at indices 0 and 2. This is a database problem rather than a pandas misbehaviour.

Fixing this duplicate entry removes the error:

# the original database
>>> df 
     id   item value                                                                                                                                                                                                                          
0  2225  prize   1.5                                                                                                                                                                                                                          
1  2225   unit    kg                                                                                                                                                                                                                          
2  2225  prize   2.4                                                                                                                                                                                                                          
3  8187   unit    lt                                                                                                                                                                                                                          
4  1401  stock    10                                                                                                                                                                                                                          
5  1401  prize   4.3  

# removing the duplicate error by changing index 2
>>> df.loc[2, 'id'] = 8187
>>> df
     id   item value
0  2225  prize   1.5
1  2225   unit    kg
2  8187  prize   2.4
3  8187   unit    lt
4  1401  stock    10
5  1401  prize   4.3

# pivot now works properly
>>> df.pivot('id', 'item')
     value
item prize stock unit
id
1401   4.3    10  NaN
2225   1.5   NaN   kg
8187   2.4   NaN   lt

Upvotes: -1

Sebastien
Sebastien

Reputation: 153

According to the pivot doc

Raises
------
ValueError:
    When there are any `index`, `columns` combinations with multiple
    values. `DataFrame.pivot_table` when you need to aggregate.

In your case, id=2225 has 2 prize entries, which is not handled by pivot. You could aggregate first and then pivot:

df1.groupby(['id', 'item']).sum().reset_index().pivot('id', 'item', 'value')

+------+-------+-------+------+
| item | prize | stock | unit |
+------+-------+-------+------+
| id   |       |       |      |
| 1401 | 4.3   | 10    | NaN  |
| 2225 | 3.9   | NaN   | kg   |
| 8187 | NaN   | NaN   | lt   |
+------+-------+-------+------+

Upvotes: 2

Related Questions