Reputation: 1
I have a Pandas dataframe with the following columns: Position, Control, Patient, REFGENE, REFGROUP And very many rows with data (methylation data). I show you the first row of the dataframe here:
Position Controls Patients REFGENE REFGROUP
16:53468112 0.598153 0.422916 gene_name TSS1500
I want to investigate the difference in methylation between control and patient so I create a new column for the difference:
df['diff_methylation'] = (df['Patient']) - (df['Control'])
Here comes my problem. I want to create a new column without values between -0.2 and 0.2 from the "diff_methylation" column. The statement below should say True if the values are below -0.2 or higher than 0.2 but all values come back False in the entire dataframe. I wonder if it has something to do with the negative value? Maybe there is an easier way than creating another column and just remove the rows directly from the dataframe that come back as True?
df['Results'] = (df['diff_methylation'] > float(0.2)) & (df['diff_methylation'] < float(-0.2))
I know I have values above 0.2 atleast, so some rows should come out as True.
I have searched the web but cannot find anyone filtering with range between a positive and a negative value in pandas dataframe.
Kindly
idama
Upvotes: 0
Views: 1886
Reputation: 5026
You can check your condition with
df['diff_methylation'].abs() > .2
or use bitwise or
|
(df['diff_methylation'] > .2) | (df['diff_methylation'] < -.2)
Upvotes: 0
Reputation: 287
your approach is correct, however, "&" corresponds to "and", so you could use :
df['Results']= (df['diff_methylation'] > float(0.2)) | (df['diff_methylation'] < float(-0.2))
Upvotes: 1