Reputation: 399
I have a df that contains the following variables:
(as well as a whole bunch of other stuff).
I want to trim outliers based on the iqr criterion. However, I want to do so per condition, per pp.
I figure a solution would start with
grouped = df.groupby(['pp','condition'])
but then what? How do I remove the outliers per group? Do I use an apply function, or does the filter function help me out here?
Upvotes: 1
Views: 1104
Reputation: 2023
You could do something like this:
# define a function to filter out your data
def filter_condition(grped_df):
if some_condition:
return grped_df[some_condition]
return grped_df
grouped = df.groupby(by=['pp','condition'])
# use apply to pass each group to your defined function and reset index to remove grouped multi index.
filtered_df = grouped.apply(filter_condition).reset_index(drop=True)
Upvotes: 1