Reputation: 743
Let's say I have a dataframe as follows. The dataframe describes the makeup of paints, such that the makeup of an arbitrarily named paint (NAME column) could be described by mixing specific colors (subtype column) in a given percentage (weight col) and specific colors can also be differentiated by a generalized parent (type). Date is included as a sanity check but is not used here.
|weight|NAME |type |subtype |date |
--------------------------------------------------
|93.35 |candyapple |red |maroon |2018-06-30|
|6.65 |candyapple |red |crimson |2018-06-30|
|93.41 |grannysmith|green |limegreen |2010-03-31|
|1.78 |grannysmith|green |deepgreen |2019-12-31|
|0.72 |grannysmith|yellow|goldyellow |2019-12-31|
|2.96 |grannysmith|brown |lightbrown |2014-10-31|
|33.33 |awfulbrown |red |maroon |2020-10-31|
|33.33 |awfulbrown |yellow|plainyellow|2010-06-30|
|33.33 |awfulbrown |green |deepgreen |2020-02-29|
--------------------------------------------------
So candyapple
's full makeup is 93.35% crimson
and 6.65% maroon
, which are both subtypes of red. grannysmith
can be represented by its subtypes as above, but we can also refer to it as by type as 95.19% green
, as the sum of its green subtypes, and 0.72% yellow
and 2.96% brown
. Names used for subtypes and types are common across paint configs, but not all configs will list all subtypes. If a subtype is not listed it is assumed to be 0.00%. So, for example, we see that candyapple
has not listed any green
- we can assume it is 0.00% limegreen
.
|NAME |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |93.35 |6.65 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |
|grannysmith|0.00 |0.00 |93.41 |1.78 |0.72 |2.96 |0.00 |0.00 |0.00 |
|awfulbrown |33.33 |0.00 |0.00 |33.33 |0.00 |0.00 |0.00 |33.33 |0.00 |
---------------------------------------------------------------------------------------------------
1a. Using pandas, how do I transpose such that values for subtype
become the column headers and all values are sorted into a single row by NAME
?
1b. Having transposed, how should I fill any empty spaces in the table with 0.00
? (eg. candyapple
is 0.00% limegreen
)
type
instead of subtype
? Weighting for type is the sum of weights for its subtypes.|NAME |red |green |yellow |brown |
----------------------------------------------
|candyapple |100.00|0.00 |0.00 |0.00 |
|grannysmith|0.00 |95.19 |0.72 |2.96 |
|awfulbrown |33.33 |33.33 |33.33 |0.00 |
----------------------------------------------
2a. Having transposed as in (1), but using type
this time, how do I use pandas/python to sum the values such that weight for a given type
will be the sum of weights for its subtype
?
|NAME |red |green |yellow |brown |maroon|crimson|limegreen|deepgreen|goldyellow|lightbrown|maroon|plainyellow|deepgreen|
---------------------------------------------------------------------------------------------------
|candyapple |100.00|0.00 |0.00 |0.00 |93.35 |6.65 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |0.00 |
|grannysmith|0.00 |95.19 |0.72 |2.96 |0.00 |0.00 |93.41 |1.78 |0.72 |2.96 |0.00 |0.00 |0.00 |
|awfulbrown |33.33 |33.33 |33.33 |0.00 |33.33 |0.00 |0.00 |33.33 |0.00 |0.00 |0.00 |33.33 |0.00 |
---------------------------------------------------------------------------------------------------
3a. Does pandas have a method to create, from the original dataset, the above combined DF of both aggregated sums of type
and the individual weights of the subtypes
?
Upvotes: 1
Views: 171
Reputation: 23099
Using pd.crosstab
subtypes = pd.crosstab(df.NAME,df.type,df.weight,aggfunc='sum')
types = pd.crosstab(df.NAME,df.subtype,df.weight,aggfunc='sum')
final = pd.concat([types,subtypes],axis=1)
print(subtypes)
subtype crimson deepgreen goldyellow lightbrown limegreen maroon \
NAME
awfulbrown NaN 33.33 NaN NaN NaN 33.33
candyapple 6.65 NaN NaN NaN NaN 93.35
grannysmith NaN 1.78 0.72 2.96 93.41 NaN
subtype plainyellow
NAME
awfulbrown 33.33
candyapple NaN
grannysmith NaN
print(types)
type brown green red yellow
NAME
awfulbrown NaN 33.33 33.33 33.33
candyapple NaN NaN 100.00 NaN
grannysmith 2.96 95.19 NaN 0.72
print(final.fillna(0))
brown green red yellow crimson deepgreen goldyellow \
NAME
awfulbrown 0.00 33.33 33.33 33.33 0.00 33.33 0.00
candyapple 0.00 0.00 100.00 0.00 6.65 0.00 0.00
grannysmith 2.96 95.19 0.00 0.72 0.00 1.78 0.72
lightbrown limegreen maroon plainyellow
NAME
awfulbrown 0.00 0.00 33.33 33.33
candyapple 0.00 0.00 93.35 0.00
grannysmith 2.96 93.41 0.00 0.00
Upvotes: 2
Reputation: 88246
For the first case a pivot
is enough, since there is no aggregation needed:
df.pivot('NAME', 'subtype', 'weight').fillna(0)
subtype crimson deepgreen goldyellow lightbrown limegreen maroon \
NAME
awfulbrown 0.00 33.33 0.00 0.00 0.00 33.33
candyapple 6.65 0.00 0.00 0.00 0.00 93.35
grannysmith 0.00 1.78 0.72 2.96 93.41 0.00
subtype plainyellow
NAME
awfulbrown 33.33
candyapple 0.00
grannysmith 0.00
For the second case you can use pivot_table
, and aggregate with the sum
:
df.pivot_table(index='NAME', columns='type', values='weight', aggfunc='sum', fill_value=0)
type brown green red yellow
NAME
awfulbrown 0.00 33.33 33.33 33.33
candyapple 0.00 0.00 100.00 0.00
grannysmith 2.96 95.19 0.00 0.72
Upvotes: 3