sectechguy
sectechguy

Reputation: 2117

Pandas Applying multiple greater than and less than grouping rows by specific column

I am creating 3 pandas dataframes based off of one original pandas dataframe. I have calculated standard deviations from the norm.

#Mean
stats_over_29000_mean = stats_over_29000['count'].mean().astype(int)

152542

#STDS
stats_over_29000_count_between_std = stats_over_29000_std - stats_over_29000_mean

54313

stats_over_29000_first_std = stats_over_29000_mean + stats_over_29000_count_between_std

206855

stats_over_29000_second_std = stats_over_29000_first_std + stats_over_29000_count_between_std

261168

stats_over_29000_third_std = stats_over_29000_second_std + stats_over_29000_count_between_std

315481

This works to get all rows from df under 2 stds

#Select all rows where count is less than 2 standard deviations 
stats_under_2_stds = stats_over_29000[stats_over_29000['count'] < stats_over_29000_second_std]

Next I would like to select all rows from df where >=2 stds and less than 3 stds

I have tried:

stats_2_and_over_under_3_stds = stats_over_29000[stats_over_29000['count'] >= stats_over_29000_second_std < stats_over_29000_third_std]

and

stats_2_and_over_under_3_stds = stats_over_29000[stats_over_29000['count'] >= stats_over_29000_second_std && < stats_over_29000_third_std]

But neither seem to work.

Upvotes: 4

Views: 24917

Answers (3)

PJ_
PJ_

Reputation: 643

The loc function allow you to apply multiple conditions to filter a dataframe in a very concise syntax. I'm putting in "column of interest" as I do not know the column name where you have the values stored. Alternatively if the column of interest is the index, you could just write a condition directly as (stats_over_29000 > 261168) inside the loc function.

    stats_over_29000.loc[(stats_over_29000('column of interest') > 261168) &\
 (stats_over_29000('column of interest') < 315481)]

Upvotes: 0

Sylvain
Sylvain

Reputation: 799

Pandas now has the Series.between(left, right, inclusive=True), that allows both both comparisons at the same time.

In your case:

stats_2_and_over_under_3_stds = \
  stats_over_29000[(stats_over_29000['count'].between(
  stats_over_29000_second_std, stats_over_29000_third_std)]

Upvotes: 6

Etienne Herlaut
Etienne Herlaut

Reputation: 586

This is how you filter on df with 2 conditions :

  • init df = pd.DataFrame([[1,2],[1,3],[1,5],[1,8]],columns=['A','B'])
  • operation : res = df[(df['B']<8) & (df['B']>2)]
  • result :

       A  B
    1  1  3
    2  1  5
    

In your case :

stats_2_and_over_under_3_stds = stats_over_29000[(stats_over_29000['count'] >= stats_over_29000_second_std) & (stats_over_29000['count'] < stats_over_29000_third_std)]

Upvotes: 8

Related Questions