Pandas: Transpose/Manipulate Dataframe

Question

Let's say I have a dataframe as follows. The dataframe describes the makeup of paints, such that the makeup of an arbitrarily named paint (NAME column) could be described by mixing specific colors (subtype column) in a given percentage (weight col) and specific colors can also be differentiated by a generalized parent (type). Date is included as a sanity check but is not used here.

|weight|NAME       |type  |subtype    |date      |
--------------------------------------------------
|93.35 |candyapple |red   |maroon     |2018-06-30|
|6.65  |candyapple |red   |crimson    |2018-06-30|
|93.41 |grannysmith|green |limegreen  |2010-03-31|
|1.78  |grannysmith|green |deepgreen  |2019-12-31|
|0.72  |grannysmith|yellow|goldyellow |2019-12-31|
|2.96  |grannysmith|brown |lightbrown |2014-10-31|
|33.33 |awfulbrown |red   |maroon     |2020-10-31|
|33.33 |awfulbrown |yellow|plainyellow|2010-06-30|
|33.33 |awfulbrown |green |deepgreen  |2020-02-29|
--------------------------------------------------

So candyapple's full makeup is 93.35% crimson and 6.65% maroon, which are both subtypes of red. grannysmith can be represented by its subtypes as above, but we can also refer to it as by type as 95.19% green, as the sum of its green subtypes, and 0.72% yellow and 2.96% brown. Names used for subtypes and types are common across paint configs, but not all configs will list all subtypes. If a subtype is not listed it is assumed to be 0.00%. So, for example, we see that candyapple has not listed any green - we can assume it is 0.00% limegreen.

Using pandas and python, how can I manipulate this dataframe to fit the following structure?

|NAME       |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |93.35 |6.65   |0.00     |0.00     |0.00      |0.00      |0.00  |0.00       |0.00     |
|grannysmith|0.00  |0.00   |93.41    |1.78     |0.72      |2.96      |0.00  |0.00       |0.00     |
|awfulbrown |33.33 |0.00   |0.00     |33.33    |0.00      |0.00      |0.00  |33.33      |0.00     |
---------------------------------------------------------------------------------------------------

1a. Using pandas, how do I transpose such that values for subtype become the column headers and all values are sorted into a single row by NAME?

1b. Having transposed, how should I fill any empty spaces in the table with 0.00? (eg. candyapple is 0.00% limegreen)

Additionally, how could I use pandas to create a similar frame, but for type instead of subtype? Weighting for type is the sum of weights for its subtypes.

|NAME       |red   |green  |yellow   |brown  |
----------------------------------------------
|candyapple |100.00|0.00   |0.00     |0.00   |
|grannysmith|0.00  |95.19  |0.72     |2.96   |
|awfulbrown |33.33 |33.33  |33.33    |0.00   |
----------------------------------------------

2a. Having transposed as in (1), but using type this time, how do I use pandas/python to sum the values such that weight for a given type will be the sum of weights for its subtype?

(Added) Can we combine the two, as below:

|NAME       |red   |green  |yellow   |brown  |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |100.00|0.00   |0.00     |0.00   |93.35 |6.65   |0.00     |0.00     |0.00      |0.00      |0.00  |0.00       |0.00     |
|grannysmith|0.00  |95.19  |0.72     |2.96   |0.00  |0.00   |93.41    |1.78     |0.72      |2.96      |0.00  |0.00       |0.00     |
|awfulbrown |33.33 |33.33  |33.33    |0.00   |33.33 |0.00   |0.00     |33.33    |0.00      |0.00      |0.00  |33.33      |0.00     |
---------------------------------------------------------------------------------------------------

3a. Does pandas have a method to create, from the original dataset, the above combined DF of both aggregated sums of type and the individual weights of the subtypes?

yatu · Accepted Answer

For the first case a pivot is enough, since there is no aggregation needed:

df.pivot('NAME', 'subtype', 'weight').fillna(0)

subtype      crimson  deepgreen  goldyellow  lightbrown  limegreen  maroon  \
NAME                                                                         
awfulbrown      0.00      33.33        0.00        0.00       0.00   33.33   
candyapple      6.65       0.00        0.00        0.00       0.00   93.35   
grannysmith     0.00       1.78        0.72        2.96      93.41    0.00   

subtype      plainyellow  
NAME                      
awfulbrown         33.33  
candyapple          0.00  
grannysmith         0.00

For the second case you can use pivot_table, and aggregate with the sum:

df.pivot_table(index='NAME', columns='type', values='weight', aggfunc='sum', fill_value=0)

type         brown  green     red  yellow
NAME                                     
awfulbrown    0.00  33.33   33.33   33.33
candyapple    0.00   0.00  100.00    0.00
grannysmith   2.96  95.19    0.00    0.72

Pandas: Transpose/Manipulate Dataframe

Answers (2)

1

2

3

Related Questions