Jj Blevins
Jj Blevins

Reputation: 395

test if number is in string in R

I have the following df

df <-
    a    b  c
    20  10 20€
    20€ 10 20 Euro

I want to test if the number 20 is part of a field. Result should therefore looks like this:

[1]true [2]false [3]true
[4]true [5]false [6]true

I tried

grepl(df[3,3], 20)
grepl(df[3,3], "20")

Both of which return false.

Upvotes: 1

Views: 75

Answers (2)

PKumar
PKumar

Reputation: 11128

You may choose to use Vectorize function over here or may be purrr::map_dfr

Vectorize(grepl, vectorize.args = 'x')(pattern='20', df)
purrr::map_dfr(df, ~grepl('20', .x))

But my solution is not better than the above one(@r2evans has more elegant), In case you want to match strictly 20, then you can also use boundary conditions \\b20\\b instead of just 20.

data:

structure(list(a = c("20", "20€"), b = c("10", "10"), c = c("20€", 
"2 Euro")), class = "data.frame", row.names = c(NA, -2L))

Output:

Vectorize(grepl, vectorize.args = 'x')(pattern='20', df)
        a     b     c
[1,] TRUE FALSE  TRUE
[2,] TRUE FALSE  TRUE

Upvotes: 0

r2evans
r2evans

Reputation: 160407

You said you wanted a matrix-like view of logicals. Brian's comment is correct, the pattern comes first ... but you also need to account for the structure: grepl(ptn, some_data_frame) returns a vector (looks like an "all-or-nothing" per-column), while grepl(ptn, some_matrix) returns a logical for every element in the matrix ... albeit not with the correct dimensions, correctable.

`dim<-`(grepl("20", as.matrix(df)), dim(df))
#      [,1]  [,2] [,3]
# [1,] TRUE FALSE TRUE
# [2,] TRUE FALSE TRUE

### or, more eye-friendly
out <- grepl("20", as.matrix(df))
dim(out) <- dim(df)
out
#      [,1]  [,2] [,3]
# [1,] TRUE FALSE TRUE
# [2,] TRUE FALSE TRUE

BTW: if you are looking for any number including "20", to include 120 and 200, then this is fine. If you want fields where the only number component is "20" (neither 120 nor 200 count), then you need "\\b20\\b" as your pattern. (Thanks Andrew.)


Data:

df <- read.table(header=T, text="
    a    b  c
    20  10 20€
    20€ 10 20Euro")

BTW: the reason that grepl("20", df) returns a vector of length 3 (one for each column) is that internally it is converting the object to character. This explains why you only get three:

as.character(df)
# [1] "c(20, 20)" "c(10, 10)" "1:2"      

Upvotes: 4

Related Questions