some_programmer
some_programmer

Reputation: 3528

How to obtain all the rows of the top & bottom 3 values based on the values of a column?

I have a df as follows:

BacksGas_Flow_sccm  ContextID   StepID  Time_Elapsed        iso_forest  anomaly_score           alarm
96.875              7296124     19      39.798              -1          -0.22435033280902072    3
96.875              7296125     19      39.993              -1          -0.22435033280902072    3
96.875              7296406     19      39.829              -1          -0.22435033280902072    3
96.875              7296405     19      39.243              -1          -0.22435033280902072    3
96.6796875          7317148     19      38.801              -1          -0.22435033280902072    3
96.6796875          7317149     19      38.801              -1          -0.22435033280902072    3
96.58203125         7293851     19      40.226              -1          -0.22435033280902072    3
96.58203125         7293852     19      40.031000000000006  -1          -0.22435033280902072    3
96.38671875         7293732     19      39.945              -1          -0.22435033280902072    3
96.38671875         7293731     19      39.945              -1          -0.22435033280902072    3
95.80078125         7297416     19      39.666000000000004  -1          -0.22435033280902072    3
95.80078125         7297415     19      39.541000000000004  -1          -0.22435033280902072    3
18.5546875          7321507     19      38.107              -1          -0.25368125176672074    -3
18.5546875          7322950     19      37.734              -1          -0.25368125176672074    -3
18.45703125         7320222     19      37.906000000000006  -1          -0.25368125176672074    -3
18.45703125         7323150     19      37.755              -1          -0.25368125176672074    -3
18.45703125         7323151     19      38.02               -1          -0.25368125176672074    -3
18.45703125         7320221     19      38.069              -1          -0.25368125176672074    -3
18.359375           7291023     19      37.718              -1          -0.25420996401901275    -3
18.359375           7291024     19      37.933              -1          -0.25420996401901275    -3
18.26171875         7316192     19      38.741              -1          -0.25420996401901275    -3
18.26171875         7312681     19      38.084              -1          -0.25420996401901275    -3
18.26171875         7312682     19      37.830000000000005  -1          -0.25420996401901275    -3
18.26171875         7316191     19      37.679              -1          -0.25420996401901275    -3
18.1640625          7291050     19      38.299              -1          -0.25420996401901275    -3
18.1640625          7311617     19      38.031000000000006  -1          -0.25420996401901275    -3
18.1640625          7324929     19      38.119              -1          -0.25420996401901275    -3
18.1640625          7291049     19      37.841              -1          -0.25420996401901275    -3
18.1640625          7311618     19      38.031000000000006  -1          -0.25420996401901275    -3
18.1640625          7324930     19      38.119              -1          -0.25420996401901275    -3
18.06640625         7306076     19      38.098              -1          -0.25420996401901275    -3
18.06640625         7317385     19      37.967000000000006  -1          -0.25420996401901275    -3
18.06640625         7316312     19      38.169000000000004  -1          -0.25420996401901275    -3
18.06640625         7306077     19      38.098              -1          -0.25420996401901275    -3
18.06640625         7317386     19      37.967000000000006  -1          -0.25420996401901275    -3
18.06640625         7316311     19      38.169000000000004  -1          -0.25420996401901275    -3

I would like to obtain the all the rows belonging to the highest 3 and the least 3 values from the BacksGas_Flow_sccm column.

In the above df:

The highest 3 values in the BacksGas_Flow_sccm column are: 96.875, 96.6796875, 95.80078125

and

The lowest 3 values in the BacksGas_Flow_sccm column are: 18.06640625, 18.1640625, 18.26171875

Expected Output:

BacksGas_Flow_sccm  ContextID   StepID  Time_Elapsed    iso_forest  anomaly_score   alarm
    96.875  7296124 19  39.798  -1  -0.22435033280902072    3
    96.875  7296125 19  39.993  -1  -0.22435033280902072    3
    96.875  7296406 19  39.829  -1  -0.22435033280902072    3
    96.875  7296405 19  39.243  -1  -0.22435033280902072    3
    96.6796875  7317148 19  38.801  -1  -0.22435033280902072    3
    96.6796875  7317149 19  38.801  -1  -0.22435033280902072    3
    96.58203125 7293851 19  40.226  -1  -0.22435033280902072    3
    96.58203125 7293852 19  40.031000000000006  -1  -0.22435033280902072    3
    18.26171875 7316192 19  38.741  -1  -0.25420996401901275    -3
    18.26171875 7312681 19  38.084  -1  -0.25420996401901275    -3
    18.26171875 7312682 19  37.830000000000005  -1  -0.25420996401901275    -3
    18.26171875 7316191 19  37.679  -1  -0.25420996401901275    -3
    18.1640625  7291050 19  38.299  -1  -0.25420996401901275    -3
    18.1640625  7311617 19  38.031000000000006  -1  -0.25420996401901275    -3
    18.1640625  7324929 19  38.119  -1  -0.25420996401901275    -3
    18.1640625  7291049 19  37.841  -1  -0.25420996401901275    -3
    18.1640625  7311618 19  38.031000000000006  -1  -0.25420996401901275    -3
    18.1640625  7324930 19  38.119  -1  -0.25420996401901275    -3
    18.06640625 7306076 19  38.098  -1  -0.25420996401901275    -3
    18.06640625 7317385 19  37.967000000000006  -1  -0.25420996401901275    -3
    18.06640625 7316312 19  38.169000000000004  -1  -0.25420996401901275    -3
    18.06640625 7306077 19  38.098  -1  -0.25420996401901275    -3
    18.06640625 7317386 19  37.967000000000006  -1  -0.25420996401901275    -3
    18.06640625 7316311 19  38.169000000000004  -1  -0.25420996401901275    -3

I tried using pd.nlargest & pd.nsmallest but it gives me the wrong output.

How can this be done?

Thanks in advance

Upvotes: 1

Views: 34

Answers (1)

anky
anky

Reputation: 75080

You can achieve this by drop_duplicates() with nlargest and nsmallest combined:

s=df.BacksGas_Flow_sccm.drop_duplicates()
(df[df.BacksGas_Flow_sccm.isin(pd.concat([s.nlargest(3),s.nsmallest(3)]))]
                                                   .reset_index(drop=True))

BacksGas_Flow_sccm  ContextID   StepID  Time_Elapsed    iso_forest  anomaly_score   alarm
0   96.875000   7296124 19  39.798  -1  -0.22435    3
1   96.875000   7296125 19  39.993  -1  -0.22435    3
2   96.875000   7296406 19  39.829  -1  -0.22435    3
3   96.875000   7296405 19  39.243  -1  -0.22435    3
4   96.679688   7317148 19  38.801  -1  -0.22435    3
5   96.679688   7317149 19  38.801  -1  -0.22435    3
6   96.582031   7293851 19  40.226  -1  -0.22435    3
7   96.582031   7293852 19  40.031  -1  -0.22435    3
8   18.261719   7316192 19  38.741  -1  -0.25421    -3
9   18.261719   7312681 19  38.084  -1  -0.25421    -3
10  18.261719   7312682 19  37.830  -1  -0.25421    -3
....
....

Upvotes: 1

Related Questions