Remove outliers by group based on IQR

Question

I have a df that contains the following variables:

pp (participant)
condition
rt (reaction time)

(as well as a whole bunch of other stuff).

I want to trim outliers based on the iqr criterion. However, I want to do so per condition, per pp.

I figure a solution would start with

grouped = df.groupby(['pp','condition'])

but then what? How do I remove the outliers per group? Do I use an apply function, or does the filter function help me out here?

AJS · Accepted Answer

You could do something like this:

# define a function to filter out your data
def filter_condition(grped_df):
    if some_condition:
        return grped_df[some_condition]
    return grped_df


grouped = df.groupby(by=['pp','condition'])

# use apply to pass each group to your defined function and reset index to remove grouped multi index.

filtered_df = grouped.apply(filter_condition).reset_index(drop=True)

Remove outliers by group based on IQR

Answers (1)

Related Questions