Reputation: 2117
I am creating 3 pandas dataframes based off of one original pandas dataframe. I have calculated standard deviations from the norm.
#Mean
stats_over_29000_mean = stats_over_29000['count'].mean().astype(int)
152542
#STDS
stats_over_29000_count_between_std = stats_over_29000_std - stats_over_29000_mean
54313
stats_over_29000_first_std = stats_over_29000_mean + stats_over_29000_count_between_std
206855
stats_over_29000_second_std = stats_over_29000_first_std + stats_over_29000_count_between_std
261168
stats_over_29000_third_std = stats_over_29000_second_std + stats_over_29000_count_between_std
315481
This works to get all rows from df under 2 stds
#Select all rows where count is less than 2 standard deviations
stats_under_2_stds = stats_over_29000[stats_over_29000['count'] < stats_over_29000_second_std]
Next I would like to select all rows from df where >=2 stds and less than 3 stds
I have tried:
stats_2_and_over_under_3_stds = stats_over_29000[stats_over_29000['count'] >= stats_over_29000_second_std < stats_over_29000_third_std]
and
stats_2_and_over_under_3_stds = stats_over_29000[stats_over_29000['count'] >= stats_over_29000_second_std && < stats_over_29000_third_std]
But neither seem to work.
Upvotes: 4
Views: 24917
Reputation: 643
The loc
function allow you to apply multiple conditions to filter a dataframe in a very concise syntax. I'm putting in "column of interest" as I do not know the column name where you have the values stored. Alternatively if the column of interest is the index, you could just write a condition directly as (stats_over_29000 > 261168)
inside the loc function.
stats_over_29000.loc[(stats_over_29000('column of interest') > 261168) &\
(stats_over_29000('column of interest') < 315481)]
Upvotes: 0
Reputation: 799
Pandas now has the Series.between(left, right, inclusive=True)
, that allows both both comparisons at the same time.
In your case:
stats_2_and_over_under_3_stds = \
stats_over_29000[(stats_over_29000['count'].between(
stats_over_29000_second_std, stats_over_29000_third_std)]
Upvotes: 6
Reputation: 586
This is how you filter on df with 2 conditions :
df = pd.DataFrame([[1,2],[1,3],[1,5],[1,8]],columns=['A','B'])
res = df[(df['B']<8) & (df['B']>2)]
result :
A B
1 1 3
2 1 5
In your case :
stats_2_and_over_under_3_stds = stats_over_29000[(stats_over_29000['count'] >= stats_over_29000_second_std) & (stats_over_29000['count'] < stats_over_29000_third_std)]
Upvotes: 8