8t12c7081
8t12c7081

Reputation: 743

Pandas: Transpose/Manipulate Dataframe

Let's say I have a dataframe as follows. The dataframe describes the makeup of paints, such that the makeup of an arbitrarily named paint (NAME column) could be described by mixing specific colors (subtype column) in a given percentage (weight col) and specific colors can also be differentiated by a generalized parent (type). Date is included as a sanity check but is not used here.

|weight|NAME       |type  |subtype    |date      |
--------------------------------------------------
|93.35 |candyapple |red   |maroon     |2018-06-30|
|6.65  |candyapple |red   |crimson    |2018-06-30|
|93.41 |grannysmith|green |limegreen  |2010-03-31|
|1.78  |grannysmith|green |deepgreen  |2019-12-31|
|0.72  |grannysmith|yellow|goldyellow |2019-12-31|
|2.96  |grannysmith|brown |lightbrown |2014-10-31|
|33.33 |awfulbrown |red   |maroon     |2020-10-31|
|33.33 |awfulbrown |yellow|plainyellow|2010-06-30|
|33.33 |awfulbrown |green |deepgreen  |2020-02-29|
--------------------------------------------------

So candyapple's full makeup is 93.35% crimson and 6.65% maroon, which are both subtypes of red. grannysmith can be represented by its subtypes as above, but we can also refer to it as by type as 95.19% green, as the sum of its green subtypes, and 0.72% yellow and 2.96% brown. Names used for subtypes and types are common across paint configs, but not all configs will list all subtypes. If a subtype is not listed it is assumed to be 0.00%. So, for example, we see that candyapple has not listed any green - we can assume it is 0.00% limegreen.

  1. Using pandas and python, how can I manipulate this dataframe to fit the following structure?
|NAME       |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |93.35 |6.65   |0.00     |0.00     |0.00      |0.00      |0.00  |0.00       |0.00     |
|grannysmith|0.00  |0.00   |93.41    |1.78     |0.72      |2.96      |0.00  |0.00       |0.00     |
|awfulbrown |33.33 |0.00   |0.00     |33.33    |0.00      |0.00      |0.00  |33.33      |0.00     |
---------------------------------------------------------------------------------------------------

1a. Using pandas, how do I transpose such that values for subtype become the column headers and all values are sorted into a single row by NAME?

1b. Having transposed, how should I fill any empty spaces in the table with 0.00? (eg. candyapple is 0.00% limegreen)

  1. Additionally, how could I use pandas to create a similar frame, but for type instead of subtype? Weighting for type is the sum of weights for its subtypes.
|NAME       |red   |green  |yellow   |brown  |
----------------------------------------------
|candyapple |100.00|0.00   |0.00     |0.00   |
|grannysmith|0.00  |95.19  |0.72     |2.96   |
|awfulbrown |33.33 |33.33  |33.33    |0.00   |
----------------------------------------------

2a. Having transposed as in (1), but using type this time, how do I use pandas/python to sum the values such that weight for a given type will be the sum of weights for its subtype?

  1. (Added) Can we combine the two, as below:
|NAME       |red   |green  |yellow   |brown  |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |100.00|0.00   |0.00     |0.00   |93.35 |6.65   |0.00     |0.00     |0.00      |0.00      |0.00  |0.00       |0.00     |
|grannysmith|0.00  |95.19  |0.72     |2.96   |0.00  |0.00   |93.41    |1.78     |0.72      |2.96      |0.00  |0.00       |0.00     |
|awfulbrown |33.33 |33.33  |33.33    |0.00   |33.33 |0.00   |0.00     |33.33    |0.00      |0.00      |0.00  |33.33      |0.00     |
---------------------------------------------------------------------------------------------------

3a. Does pandas have a method to create, from the original dataset, the above combined DF of both aggregated sums of type and the individual weights of the subtypes?

Upvotes: 1

Views: 171

Answers (2)

Umar.H
Umar.H

Reputation: 23099

Using pd.crosstab

subtypes = pd.crosstab(df.NAME,df.type,df.weight,aggfunc='sum')

types = pd.crosstab(df.NAME,df.subtype,df.weight,aggfunc='sum')

final = pd.concat([types,subtypes],axis=1)

1

print(subtypes)

subtype      crimson  deepgreen  goldyellow  lightbrown  limegreen  maroon  \
NAME                                                                         
awfulbrown       NaN      33.33         NaN         NaN        NaN   33.33   
candyapple      6.65        NaN         NaN         NaN        NaN   93.35   
grannysmith      NaN       1.78        0.72        2.96      93.41     NaN   

subtype      plainyellow  
NAME                      
awfulbrown         33.33  
candyapple           NaN  
grannysmith          NaN  

2

print(types)
type         brown   green   red     yellow
NAME                                       
awfulbrown      NaN   33.33   33.33   33.33
candyapple      NaN     NaN  100.00     NaN
grannysmith    2.96   95.19     NaN    0.72

3

print(final.fillna(0))

             brown   green   red     yellow  crimson  deepgreen  goldyellow  \
NAME                                                                          
awfulbrown     0.00   33.33   33.33   33.33     0.00      33.33        0.00   
candyapple     0.00    0.00  100.00    0.00     6.65       0.00        0.00   
grannysmith    2.96   95.19    0.00    0.72     0.00       1.78        0.72   

             lightbrown  limegreen  maroon  plainyellow  
NAME                                                     
awfulbrown         0.00       0.00   33.33        33.33  
candyapple         0.00       0.00   93.35         0.00  
grannysmith        2.96      93.41    0.00         0.00  

Upvotes: 2

yatu
yatu

Reputation: 88246

For the first case a pivot is enough, since there is no aggregation needed:

df.pivot('NAME', 'subtype', 'weight').fillna(0)

subtype      crimson  deepgreen  goldyellow  lightbrown  limegreen  maroon  \
NAME                                                                         
awfulbrown      0.00      33.33        0.00        0.00       0.00   33.33   
candyapple      6.65       0.00        0.00        0.00       0.00   93.35   
grannysmith     0.00       1.78        0.72        2.96      93.41    0.00   

subtype      plainyellow  
NAME                      
awfulbrown         33.33  
candyapple          0.00  
grannysmith         0.00  

For the second case you can use pivot_table, and aggregate with the sum:

df.pivot_table(index='NAME', columns='type', values='weight', aggfunc='sum', fill_value=0)

type         brown  green     red  yellow
NAME                                     
awfulbrown    0.00  33.33   33.33   33.33
candyapple    0.00   0.00  100.00    0.00
grannysmith   2.96  95.19    0.00    0.72

Upvotes: 3

Related Questions