fifigoblin
fifigoblin

Reputation: 455

Grep based on decimal places in two columns

I've got a dataset that looks like this:

x        y 
112.21   234.511
56.22    1.1111
3.456    2.31 
1.1      2.4567 
3.411    4.5

I want to subset the rows of this dataset by values of x and y which have 2 or more decimal places. So the end result will be this:

x        y 
112.21   234.511
56.22    1.1111
3.456    2.31 

# two last rows are removed as they have values with less than 2 decimal places 

I tried doing something of this sort, but it doesn't work properly (it keeps some 1 decimal place values):

edited_df <- df[grep("\\.[1-9][1-9]", df$x) && grep("\\.[1-9][1-9]", df$y)] 

How can I do this?

Upvotes: 1

Views: 279

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270045

Assuming that d is as in the Note the end, create a function decimals that returns TRUE for each element of its vector argument that has 2+ decimals (or FALSE otherwise) or if given a data frame argument applies that to each column. Use it to subset d.

decimals <- function(x) sapply(x, grepl, pattern = r"{\.\d\d}")

subset(d, decimals(x) & decimals(y))
##         x        y
## 1 112.210 234.5110
## 2  56.220   1.1111
## 3   3.456   2.3100

or if there can be an unknown number of numeric columns in d or different column names then replace the last line with:

subset(d, apply(decimals(d), 1, all))

Note

Lines <- "
x        y 
112.21   234.511
56.22    1.1111
3.456    2.31 
1.1      2.4567 
3.411    4.5"
d <- read.table(text = Lines, header = TRUE)

Upvotes: 1

jay.sf
jay.sf

Reputation: 73562

Using nchar on the "suffix" after the point, which you get using gsub.

d[rowSums(nchar(sapply(d, gsub, pa="^.*\\.", re="")) > 1) > 1, ]
#         x        y
# 1 112.210 234.5110
# 2  56.220   1.1111
# 3   3.456   2.3100

If gsub was Vectorized like so:

g <- Vectorize(gsub)

we could do the approach slightly more succinct:

d[rowSums(nchar(g(pa="^.*\\.", re="", d)) > 1) > 1, ]
#         x        y
# 1 112.210 234.5110
# 2  56.220   1.1111
# 3   3.456   2.3100

Data:

d <- structure(list(x = c(112.21, 56.22, 3.456, 1.1, 3.411), y = c(234.511, 
1.1111, 2.31, 2.4567, 4.5)), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 1

Related Questions