Reputation: 8816
I'm Just trying to play will pivot
to transform the data Frame in a way i desired which pivot does but here its not working.
Any expert insight please.
>>> df1
id item value
0 2225 prize 1.5
1 2225 unit kg
2 2225 prize 2.4
3 8187 unit lt
4 1401 stock 10
5 1401 prize 4.3
While running pivot
i see the below error..
>>> df1.pivot('id', 'item')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4359, in pivot
return pivot(self, index=index, columns=columns, values=values)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 381, in pivot
return indexed.unstack(columns)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/frame.py", line 4546, in unstack
return unstack(self, level, fill_value)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 469, in unstack
return _unstack_frame(obj, level, fill_value=fill_value)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 488, in _unstack_frame
fill_value=fill_value)
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 116, in __init__
self._make_selectors()
File "/grid/common/pkgs/python/v3.6.1/lib/python3.6/site-packages/pandas/core/reshape/reshape.py", line 154, in _make_selectors
raise ValueError('Index contains duplicate entries, '
ValueError: Index contains duplicate entries, cannot reshape
Even pivot_table
producing errors..
>>> df1.pivot_table(columns='item', values='value')
I tried to github link reference but didn't get it.
Desired should be like:
Value
item prize stock unit
id
2225 1.5 10 2.4 lt
Upvotes: 2
Views: 122
Reputation: 862511
Problem of data is there are duplicates and also mixed numeric with strings in column value
.
General solution - if need for numeric values mean
and for duplicates strings join
:
def f(x):
y = pd.to_numeric(x, errors='coerce')
if y.isna().all():
return ', '.join(x)
else:
return y.mean()
df = df1.pivot_table(index='id',columns='item', values='value', aggfunc=f)
print (df)
item prize stock unit
id
1401 4.3 10 NaN
2225 1.95 NaN kg
8187 NaN NaN lt
Another idea is aggregate numeric and non numeric separate and then concat
together:
df1['value1'] = pd.to_numeric(df1['value'], errors='coerce')
df2 = df1.pivot_table(index='id',columns='item', values='value1', aggfunc='mean')
df3 = df1[df1['value1'].isna()]
.pivot_table(index='id',columns='item', values='value', aggfunc=','.join)
df = pd.concat([df2, df3], axis=1)
print (df)
item prize stock unit
id
1401 4.30 10.0 NaN
2225 1.95 NaN kg
8187 NaN NaN lt
Upvotes: 2
Reputation: 741
you should try the following:
pd.pivot_table(df1, values='value', index=['id'], columns=['item'], aggfunc=np.sum)
And as mentioned below by jezrael, aggfunc can be mean for numeric values and for strings can be join.
Upvotes: 3
Reputation: 947
Pandas is complaining about the fact that you have the entry (2225, prize)
twice at indices 0 and 2. This is a database problem rather than a pandas
misbehaviour.
Fixing this duplicate entry removes the error:
# the original database
>>> df
id item value
0 2225 prize 1.5
1 2225 unit kg
2 2225 prize 2.4
3 8187 unit lt
4 1401 stock 10
5 1401 prize 4.3
# removing the duplicate error by changing index 2
>>> df.loc[2, 'id'] = 8187
>>> df
id item value
0 2225 prize 1.5
1 2225 unit kg
2 8187 prize 2.4
3 8187 unit lt
4 1401 stock 10
5 1401 prize 4.3
# pivot now works properly
>>> df.pivot('id', 'item')
value
item prize stock unit
id
1401 4.3 10 NaN
2225 1.5 NaN kg
8187 2.4 NaN lt
Upvotes: -1
Reputation: 153
According to the pivot doc
Raises
------
ValueError:
When there are any `index`, `columns` combinations with multiple
values. `DataFrame.pivot_table` when you need to aggregate.
In your case, id=2225
has 2 prize
entries, which is not handled by pivot
. You could aggregate first and then pivot:
df1.groupby(['id', 'item']).sum().reset_index().pivot('id', 'item', 'value')
+------+-------+-------+------+
| item | prize | stock | unit |
+------+-------+-------+------+
| id | | | |
| 1401 | 4.3 | 10 | NaN |
| 2225 | 3.9 | NaN | kg |
| 8187 | NaN | NaN | lt |
+------+-------+-------+------+
Upvotes: 2