idamaa
idamaa

Reputation: 1

How to filter pandas dataframe between a negative and a positive value (-0.2 to 0.2), and removing the rows that meet the condition?

I have a Pandas dataframe with the following columns: Position, Control, Patient, REFGENE, REFGROUP And very many rows with data (methylation data). I show you the first row of the dataframe here:

Position        Controls        Patients        REFGENE        REFGROUP

16:53468112     0.598153        0.422916        gene_name      TSS1500

I want to investigate the difference in methylation between control and patient so I create a new column for the difference:

df['diff_methylation'] = (df['Patient']) - (df['Control'])

Here comes my problem. I want to create a new column without values between -0.2 and 0.2 from the "diff_methylation" column. The statement below should say True if the values are below -0.2 or higher than 0.2 but all values come back False in the entire dataframe. I wonder if it has something to do with the negative value? Maybe there is an easier way than creating another column and just remove the rows directly from the dataframe that come back as True?

df['Results'] = (df['diff_methylation']  > float(0.2)) & (df['diff_methylation'] < float(-0.2))

I know I have values above 0.2 atleast, so some rows should come out as True.

I have searched the web but cannot find anyone filtering with range between a positive and a negative value in pandas dataframe.

Kindly

idama

Upvotes: 0

Views: 1886

Answers (2)

Michael Szczesny
Michael Szczesny

Reputation: 5026

You can check your condition with

df['diff_methylation'].abs() > .2

or use bitwise or |

(df['diff_methylation']  > .2) | (df['diff_methylation'] < -.2)

Upvotes: 0

NHL
NHL

Reputation: 287

your approach is correct, however, "&" corresponds to "and", so you could use :

df['Results']= (df['diff_methylation'] > float(0.2)) | (df['diff_methylation'] < float(-0.2))

Upvotes: 1

Related Questions