Is it possible to reshape pandas DataFrame using parts of column names?

Question

I am just getting started with working in pandas and with dataframes. I'd like to reshape some data but I'm not sure the best approach to do so. My instinct says to iterate over the frame but I'm hoping there is some better way.

So, I have an initial dataframe that looks like this:

vendor_state	client_state	date	total_widget_a_purchases	total_widget_b_purchases
CA	WA	2021-02-01	10	5
CA	OR	2021-02-01	8	7
NY	NJ	2021-03-07	15	9
NY	NJ	2021-02-08	7	25
NY	NY	2021-02-08	24	3

I would like to get it to the following state:

vendor_state	client_state	widget type	2021-02-01	2021-02-08	2021-03-07
CA	WA	widget_a	10	0	0
CA	WA	widget_b	5	0	0
NY	NJ	widget_a	0	7	15
NY	NJ	widget_b	0	25	9
NY	NY	widget_a	0	24	0
NY	NY	widget_b	0	3	0

There are two areas I'm really struggling with here.

Is there some way for me to gather the widget_a and widget_b from the original column names to be in the widget type column of the result?
Is there a good operation to end up with the columns in the way I would like? It feels to me like some sort of pivot would work but that has tended to end in columns like

CA/WA/2021-02-01	CA/WA/2021-02-08	CA/WA/2021-03-07

I am hoping that I am just missing something elementary due to not having worked with pandas in the past.

Nk03 · Accepted Answer

via stack unstack:

df = (df.set_index(['vendor_state','client_state','date'])
 .stack()
 .unstack(2)
 .reset_index()
 .rename(columns={'level_2': 'widget type'})
 .fillna(0)
 )
df['widget type'] = df['widget type'].str.extract(pat = ("(widget_[a|b])"))

Output:

	vendor_state	client_state	widget type	2021-02-01	2021-02-08	2021-03-07
0	CA	OR	widget_a	8.0	0.0	0.0
1	CA	OR	widget_b	7.0	0.0	0.0
2	CA	WA	widget_a	10.0	0.0	0.0
3	CA	WA	widget_b	5.0	0.0	0.0
4	NY	NJ	widget_a	0.0	7.0	15.0
5	NY	NJ	widget_b	0.0	25.0	9.0
6	NY	NY	widget_a	0.0	24.0	0.0
7	NY	NY	widget_b	0.0	3.0	0.0

Is it possible to reshape pandas DataFrame using parts of column names?

Answers (1)

Related Questions