Reputation: 107
I am just getting started with working in pandas and with dataframes. I'd like to reshape some data but I'm not sure the best approach to do so. My instinct says to iterate over the frame but I'm hoping there is some better way.
So, I have an initial dataframe that looks like this:
vendor_state | client_state | date | total_widget_a_purchases | total_widget_b_purchases |
---|---|---|---|---|
CA | WA | 2021-02-01 | 10 | 5 |
CA | OR | 2021-02-01 | 8 | 7 |
NY | NJ | 2021-03-07 | 15 | 9 |
NY | NJ | 2021-02-08 | 7 | 25 |
NY | NY | 2021-02-08 | 24 | 3 |
I would like to get it to the following state:
vendor_state | client_state | widget type | 2021-02-01 | 2021-02-08 | 2021-03-07 |
---|---|---|---|---|---|
CA | WA | widget_a | 10 | 0 | 0 |
CA | WA | widget_b | 5 | 0 | 0 |
NY | NJ | widget_a | 0 | 7 | 15 |
NY | NJ | widget_b | 0 | 25 | 9 |
NY | NY | widget_a | 0 | 24 | 0 |
NY | NY | widget_b | 0 | 3 | 0 |
There are two areas I'm really struggling with here.
CA/WA/2021-02-01 | CA/WA/2021-02-08 | CA/WA/2021-03-07 |
---|
I am hoping that I am just missing something elementary due to not having worked with pandas in the past.
Upvotes: 3
Views: 166
Reputation: 14949
via stack unstack
:
df = (df.set_index(['vendor_state','client_state','date'])
.stack()
.unstack(2)
.reset_index()
.rename(columns={'level_2': 'widget type'})
.fillna(0)
)
df['widget type'] = df['widget type'].str.extract(pat = ("(widget_[a|b])"))
Output:
vendor_state | client_state | widget type | 2021-02-01 | 2021-02-08 | 2021-03-07 | |
---|---|---|---|---|---|---|
0 | CA | OR | widget_a | 8.0 | 0.0 | 0.0 |
1 | CA | OR | widget_b | 7.0 | 0.0 | 0.0 |
2 | CA | WA | widget_a | 10.0 | 0.0 | 0.0 |
3 | CA | WA | widget_b | 5.0 | 0.0 | 0.0 |
4 | NY | NJ | widget_a | 0.0 | 7.0 | 15.0 |
5 | NY | NJ | widget_b | 0.0 | 25.0 | 9.0 |
6 | NY | NY | widget_a | 0.0 | 24.0 | 0.0 |
7 | NY | NY | widget_b | 0.0 | 3.0 | 0.0 |
Upvotes: 4