vizidea
vizidea

Reputation: 193

Using SUBSTR/GREP in R to extract a list of strings from a column?

I know this question has been asked before and I've been trying to adapt the logic to my situation, but I'm not sure what I'm doing that's wrong.

I have a dataframe where I'm trying to create a new True/False column based on whether an element in another column has a string I'm searching for.

cpt <- data.frame(value = c("62267", "62268", "62269"))
ex <- data.frame(code = c("2456", "62267", "6200", "62268", "63001", "62269"))

where I want a true when a string in ex equals one of the strings in cpt.

I've tried this:

cpt1 <- paste(cpt, collapse = '|')
setDT(ex)[,i4 := str_extract(ex$code, cpt)]

and

setDT(ex)[,i3 := sapply(cpt1, grepl, ex$code)] 

and

setDT(ex)[,i2 := any(grep(cpt1,ex$code))]

but my "i" column always comes out as NULL. I'd like to keep it using the data.table package since I have chains following this snippet of code. I'm not sure what I'm doing wrong? Any help/advice would be greatly appreciated!

Upvotes: 0

Views: 655

Answers (2)

B. Christian Kamgang
B. Christian Kamgang

Reputation: 6489

The TRUE/FALSE column could also be generated using the function %chin% in data.table package. It basically checks whether each element (string) in its left-hand side appears in its right-hand side.

setDT(ex)[, i := code %chin% cpt$value]

#      code      i
# 1:   2456  FALSE
# 2:  62267   TRUE
# 3:   6200  FALSE
# 4:  62268   TRUE
# 5:  63001  FALSE
# 6:  62269   TRUE

Upvotes: 1

akrun
akrun

Reputation: 887078

We need to create the pattern from a vector instead of a data.frame i.e. extract the column 'value' and paste

library(data.table)
library(stringr)
cpt1 <- paste(cpt$value, collapse = '|')
setDT(ex)[, i4 := str_extract(code, cpt1)]
ex[, i3 := sapply(cpt1, grepl, code)]
ex[, i2 := any(grepl(cpt1, code))]

-output

ex
    code    i4    i3   i2
1:  2456  <NA> FALSE TRUE
2: 62267 62267  TRUE TRUE
3:  6200  <NA> FALSE TRUE
4: 62268 62268  TRUE TRUE
5: 63001  <NA> FALSE TRUE
6: 62269 62269  TRUE TRUE

Upvotes: 2

Related Questions