Adding a new pandas dataframe columns populating it with conditional calculations (mean if, standard deviation if)

Question

I need to add 2 columns to a Pandas data frame populated with conditional mean and standard deviation.

# Import pandas library 
import pandas as pd 

# Initialize list of lists 
data = [
    ['Tom',   'Blue', 'Small',  10, ], 
    ['Tom',   'Blue', 'Big',    15, ], 
    ['Tom',   'Red',  'Big',     3, ], 
    ['Tom',   'Red',  'Small',   8, ], 
    ['Tom',   'Red',  'Medium',  5, ], 
    ['Tom',   'Blue', 'Medium', 12, ], 
    ['Nick',  'Blue', 'Small',  15, ], 
    ['Nick',  'Red',  'Medium', 11, ], 
    ['Nick',  'Red',  'Small',  17, ], 
    ['Julie', 'Blue', 'Big',    14, ], 
    ['Julie', 'Red',  'Big',    11, ],
    ['Julie', 'Red',  'Medium',  3, ], 
    ['Julie', 'Red',  'Medium',  4, ],
]

# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Name', 'Colour', 'Size', 'Age']) 

# print dataframe. 
df

In this example, I need to add 2 columns: df['mean'] and df['std'] conditional to the name and colour only.

I looked at .mean() and .std() documentation but didn't find a way to add a set of conditions. I believe I might need 2 functions that, for each row, takes the name and the colour as arguments, calculate the mean or std over the whole series, and then populate the new columns.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.std.html

I believe I need a function that dynamically checks the name and the colour for each row, because, unlike this example data frame, my application is for a data frame that has thousands and names and thousands of colours.

I know how to apply .mean or .std to a whole pandas series, without conditions:

df['Age'].std()

or

df['Age'].mean()

I also found that a group by + transformation does something slightly similar without the capacity to dynamically check for each row like a function could.

df['mean'] = df.groupby(['Name','Colour']).transform('mean')

Thank you for your help.

Here is the output I expected. Thanks for the answers: Output dataframe

Adding a new pandas dataframe columns populating it with conditional calculations (mean if, standard deviation if)

Answers (1)

Related Questions