Reputation: 455
I've got a dataset that looks like this:
x y
112.21 234.511
56.22 1.1111
3.456 2.31
1.1 2.4567
3.411 4.5
I want to subset the rows of this dataset by values of x and y which have 2 or more decimal places. So the end result will be this:
x y
112.21 234.511
56.22 1.1111
3.456 2.31
# two last rows are removed as they have values with less than 2 decimal places
I tried doing something of this sort, but it doesn't work properly (it keeps some 1 decimal place values):
edited_df <- df[grep("\\.[1-9][1-9]", df$x) && grep("\\.[1-9][1-9]", df$y)]
How can I do this?
Upvotes: 1
Views: 279
Reputation: 270045
Assuming that d
is as in the Note the end, create a function decimals
that returns TRUE for each element of its vector argument that has 2+ decimals (or FALSE otherwise) or if given a data frame argument applies that to each column. Use it to subset d
.
decimals <- function(x) sapply(x, grepl, pattern = r"{\.\d\d}")
subset(d, decimals(x) & decimals(y))
## x y
## 1 112.210 234.5110
## 2 56.220 1.1111
## 3 3.456 2.3100
or if there can be an unknown number of numeric columns in d
or different column names then replace the last line with:
subset(d, apply(decimals(d), 1, all))
Lines <- "
x y
112.21 234.511
56.22 1.1111
3.456 2.31
1.1 2.4567
3.411 4.5"
d <- read.table(text = Lines, header = TRUE)
Upvotes: 1
Reputation: 73562
Using nchar
on the "suffix" after the point, which you get using gsub
.
d[rowSums(nchar(sapply(d, gsub, pa="^.*\\.", re="")) > 1) > 1, ]
# x y
# 1 112.210 234.5110
# 2 56.220 1.1111
# 3 3.456 2.3100
If gsub
was Vectorized
like so:
g <- Vectorize(gsub)
we could do the approach slightly more succinct:
d[rowSums(nchar(g(pa="^.*\\.", re="", d)) > 1) > 1, ]
# x y
# 1 112.210 234.5110
# 2 56.220 1.1111
# 3 3.456 2.3100
Data:
d <- structure(list(x = c(112.21, 56.22, 3.456, 1.1, 3.411), y = c(234.511,
1.1111, 2.31, 2.4567, 4.5)), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 1