Reputation: 15
I don't know how to ask this question, but i'll try do explain my case.
I have a dataset with the data as following:
Product | Value | Value type | year |
---|---|---|---|
A | 21,5 | Price | 21 |
A | 5 | Volume | 21 |
B | 55,3 | Price | 21 |
B | 10 | Volume | 21 |
C | 70,0 | Price | 21 |
D | 37,5 | Price | 21 |
D | 7,7 | Volume | 21 |
And I want to reach something like that:
Product | Price | Volume | Year |
---|---|---|---|
A | 21,5 | 5 | 21 |
B | 55,3 | 10 | 21 |
c | 70,0 | - | 21 |
D | 37,0 | 7,7 | 21 |
I mind that the unstack function can solve the problem, but i don't know how, cause i'm not getting all the columns back.
I found a complex solution but it's not working.
container = []
for label, _df in df.groupby(['Year','Product']):
_df.set_index('Value type', inplace = True)
container.append(pd.DataFrame({
"Product": [label[1]],
"Price":[_df.loc['Price', 'Value']],
"Volume": [_df.loc['Volume', 'Value']],
"Year":[label[0]]}))
df_new = pd.concat(container)
This solution doesn't work, because the missing line for Volume for product C.
How can I reach the expected dataframe? Is there any fast way to calculate this?
Upvotes: 0
Views: 71
Reputation: 120399
Use pivot
:
out = df.pivot(index=['Product', 'year'], columns='Value type', values=['Value']) \
.droplevel(0, axis=1).reset_index().rename_axis(None, axis=1) \
[['Product', 'Price', 'Volume', 'year']]
>>> out
Product Price Volume year
0 A 21.5 5.0 21
1 B 55.3 10.0 21
2 C 70.0 NaN 21
3 D 37.5 7.7 21
Upvotes: 1