Reputation: 41
I am trying to get rid of all values of 0 from my AFR_META column..
head(data1)
CHR BP REF ALT AFR_META LOG SNPid
1 1 11063 T G 0.0002751 8.19838 1
2 1 13259 G A 0.0002778 8.18861 2
3 1 17641 G A 0.0008361 7.08676 3
I have previously used the following on a separate dataset that worked great:
data1<-data1[-which(data1$AFR_META==0),]
But for some reason, on this column (which is a subset of the other data set), I keep getting errors and have my file wiped out.
data1<-data1[-which(data1$AFR_META==0),]
head(data1)
[1] CHR BP REF ALT AFR_META LOG SNPid
<0 rows> (or 0-length row.names)
I'm not sure why it's acting differently.. They are both numeric columns and I doubled checked using sapply.
sapply(data1, class)
CHR BP REF ALT AFR_META LOG SNPid
"numeric" "integer" "factor" "factor" "numeric" "numeric" "integer"
Any guidance and help would be nice! I'm working within R right now, but could run it in linux using awk with help. I tried awk earlier but didn't have luck writing out the right filter. Sorry, new to this and I've been spinning in small circles over this. Finally asking for help! Thank you all so much.
Upvotes: 0
Views: 163
Reputation: 3223
Maybe you already could solve your problem with Ronak Shah's comment, it ist indeed not a good idea to compare floating point value to zero.
However if you intend to remove all rows where AFR_META is exactly zero then you have a problem when there is no such row. The result of which(data1$AFR_META==0)
is a integer(0)
and that removes all rows of your dataframe.
Instead of using the indices you can simply use the logical vector for selecting the rows to be removed:
data1 <- data1[!data1$AFR_META==0,]
For data manipulations I suggest using package dplyr:
library(dplyr)
# filtering all zero values
filter(data1, AFR_META != 0)
# or, adressing the floating point issue
filter(data1, !between(AFR_META, -0.0001, 0.0001))
Upvotes: 1