Reputation: 20928
How would I remove rows in a dataframe whose values are within a certain threshold?
x y
1 -0.111111e-15 0.111111e-15
2 -1.111112e-15 1.111112e-15
3 -1.111111e-15 1.111111e-15
For example if I set the threshold to 1e^-8
, the second or third row will be removed.
Upvotes: 2
Views: 99
Reputation: 21621
Similar approach using dplyr
that would work on both data.table
or data.frame
dfrm<-data.frame(id=letters[1:3],x=c(-1/9/1e15,-1/9/1e14,-1/9/1e14),
y=c(1/9/1e15,1/9/1e14,1/9/1e14))
library(dplyr)
dfrm %>%
# select only numeric columns
select(which(sapply(., is.numeric))) %>%
# remove rows
slice(which(!duplicated(round(., -8)))) %>%
# right join the result with original dataset (get back unselected non-numeric columns)
right_join(dfrm, .)
Upvotes: 2
Reputation: 34703
Here's a possible data.table
(if your data is now a data.frame
df
, just set dt<-data.table(df)
).
A more complicated version of your data, with non numeric columns:
library(data.table)
dt <- data.table(id=letters[1:3],
x=c(-1/9/1e15,-1/9/1e14,-1/9/1e14),
y=c(1/9/1e15,1/9/1e14,1/9/1e14))
Now we just round all the numeric columns to your threshold and find unique rows:
indx <- names(dt)[sapply(dt, is.numeric)] ## Find numeric columns
unique(dt[, lapply(.SD, round, 8), .SDcols = indx])
# x y
# 1: 0 0
Alternatively, you can keep both the numeric and non-numeric columns while subsetting only by the numeric columns
unique(dt[, (indx) := lapply(.SD, round, 8), .SDcols = indx], by = indx)
# id x y
# 1: a 0 0
Upvotes: 4
Reputation: 263301
I input console output with a little utility function rd.txt:
> dat <- rd.txt(" x y
+ 1 -0.111111e-15 0.111111e-15
+ 2 -1.111112e-15 1.111112e-15
+ 3 -1.111111e-15 1.111111e-15"
+ )
> dat[ ! duplicated( round(dat,-8) ),]
x y
1 -1.11111e-16 1.11111e-16
(My first version with a minus sign rather than a negation operator was not correct.) This would need some modifications if all the columns were not numeric. If tht's the case then please post a proper test example, preferably with dput()-output rather than console output which is often ambiguous.
With the example from the other respondent (modified to deliver the requested object class):
dfrm<-data.frame(id=letters[1:3],x=c(-1/9/1e15,-1/9/1e14,-1/9/1e14),
y=c(1/9/1e15,1/9/1e14,1/9/1e14))
dfrm[ ! duplicated( round( dfrm[ , sapply(dfrm, is.numeric)],8)), ]
id x y
1 a -1.111111e-16 1.111111e-16
Upvotes: 5