Indrid
Indrid

Reputation: 1212

Pandas How To Pivot Without Losing Data

I'm trying to assess Power Query/M with Pandas for some repeatable transformations. So far so good, but I've hit a brick wall with the following scenario in Pandas and would be very grateful for pointers as to what (I presume) I am doing wrong.

I have a DataFrame which looks like:

    sales rep   quarter result  value
0   adam        q1      target  3000
1   ben         q1      target  3200
2   cal         q1      target  2900
3   dan         q1      target  3400
4   adam        q1      actual  2900
5   ben         q1      actual  2100
6   cal         q1      actual  2100
7   dan         q1      actual  3500
8   adam        q2      target  4000
9   ben         q2      target  3200
10  cal         q2      target  7200
11  dan         q2      target  5200
12  adam        q2      actual  3400
13  ben         q2      actual  3900
14  cal         q2      actual  9000
15  dan         q2      actual  6000
16  adam        q3      target  4000
17  ben         q3      target  4000
18  cal         q3      target  7200
19  dan         q3      target  6400
20  adam        q3      actual  4500
21  ben         q3      actual  4000
22  cal         q3      actual  6500
23  dan         q3      actual  6500
24  adam        q4      target  5200
25  ben         q4      target  4300
26  cal         q4      target  7000
27  dan         q4      target  6500
28  adam        q4      actual  5100
29  ben         q4      actual  4300
30  cal         q4      actual  6900
31  dan         q4      actual  7200

And I am trying to pivot the data on the 'result' column using the 'value' column for the values. I need to retain the 'sales rep' and the 'quarter' columns. Essentially I am trying to get to (and I get by default on the unpivot operation in Power Query) is:

sales rep   quarter target  actual
adam        q1      3000    2900
adam        q2      4000    3400
adam        q3      4000    4500
adam        q4      5200    5100
ben         q1      3200    2100
ben         q2      3200    3900
ben         q3      4000    4000
ben         q4      4300    4300
cal         q1      2900    2100
cal         q2      7200    9000
cal         q3      7200    6500
cal         q4      7000    6900
dan         q1      3400    3500
dan         q2      5200    6000
dan         q3      6400    6500
dan         q4      6500    7200

Using the pivot_table method in Pandas I get: enter image description here

Which is, obviously, a pivot table which makes it much harder to then do any further manipulations on the table. Is there any way to get there in Pandas?

Thanks!

Upvotes: 0

Views: 1302

Answers (1)

RichieV
RichieV

Reputation: 5183

DataFrame.pivot() is a specialized or second level function designed to mix .groupby() and .unstack() into one method. Pivot and reset_index will work fine in this example.

However, .groupby() is a more generalized method that offers the main benefit of being able to group by any Indexer other than a column :

  • an series not in the DataFrame
  • a level of the MultiIndex
  • the df single-level Index containing duplicates
  • a function applied to a column (e.g. df.col1 > 100 or np.sign(df.col1))
  • any combination of the above
  • basically any Indexer that has THE SAME LENGTH as the DataFrame being grouped

On top of that big advantage, groupby's ability to combine with apply and transform makes virtually every data analysis possible.

You might want to get familiar Pandas Groupby user guide

Upvotes: 1

Related Questions