Dana Al-Hindi
Dana Al-Hindi

Reputation: 41

Loop to filter values of 0 in one column

I am trying to get rid of all values of 0 from my AFR_META column..

head(data1)
  CHR    BP REF ALT  AFR_META     LOG SNPid
1   1 11063   T   G 0.0002751 8.19838     1
2   1 13259   G   A 0.0002778 8.18861     2
3   1 17641   G   A 0.0008361 7.08676     3

I have previously used the following on a separate dataset that worked great:

data1<-data1[-which(data1$AFR_META==0),]

But for some reason, on this column (which is a subset of the other data set), I keep getting errors and have my file wiped out.

data1<-data1[-which(data1$AFR_META==0),]
head(data1)
[1] CHR      BP       REF      ALT      AFR_META LOG      SNPid   
<0 rows> (or 0-length row.names)

I'm not sure why it's acting differently.. They are both numeric columns and I doubled checked using sapply.

sapply(data1, class)
      CHR        BP       REF       ALT  AFR_META       LOG     SNPid 
"numeric" "integer"  "factor"  "factor" "numeric" "numeric" "integer"

Any guidance and help would be nice! I'm working within R right now, but could run it in linux using awk with help. I tried awk earlier but didn't have luck writing out the right filter. Sorry, new to this and I've been spinning in small circles over this. Finally asking for help! Thank you all so much.

Upvotes: 0

Views: 163

Answers (1)

MarkusN
MarkusN

Reputation: 3223

Maybe you already could solve your problem with Ronak Shah's comment, it ist indeed not a good idea to compare floating point value to zero.

However if you intend to remove all rows where AFR_META is exactly zero then you have a problem when there is no such row. The result of which(data1$AFR_META==0) is a integer(0) and that removes all rows of your dataframe.

Instead of using the indices you can simply use the logical vector for selecting the rows to be removed:

data1 <- data1[!data1$AFR_META==0,]

For data manipulations I suggest using package dplyr:

library(dplyr)

# filtering all zero values
filter(data1, AFR_META != 0)

# or, adressing the floating point issue
filter(data1, !between(AFR_META, -0.0001, 0.0001))

Upvotes: 1

Related Questions