Reputation: 960
I can not really find an elegant way achieving this, please help.
I have a DT
data.table:
name,value
"lorem pear ipsum",4
"apple ipsum lorem",2
"lorem ipsum plum",6
And based on a list Fruits <- c("pear", "apple", "plum")
I'd like to create a factor type column.
name,value,factor
"lorem pear ipsum",4,"pear"
"apple ipsum lorem",2,"apple"
"lorem ipsum plum",6,"plum"
I guess that's basic, but I'm kinda stuck, this is how far I got:
DT[grep("apple", name, ignore.case=TRUE), factor := as.factor("apple")]
Thanks in advance.
Upvotes: 4
Views: 1255
Reputation: 1079
Here is my coded solution. The hard part is getting the matched string from regex
. The best general solution (that finds whatever is matched to any regular expression) I know of is the regexec
and regmatches
combination (see below).
# Create the data frame
name <- c("lorem pear ipsum", "apple ipsum lorem", "lorem ipsum plum")
value <- c(4,2,6)
DT <- data.frame(name=name, value=value, stringsAsFactors=FALSE)
# Create the regular expression
Fruits <- c("pear", "apple", "plum")
myRegEx <- paste(Fruits, collapse = "|")
# Find the matches
r <- regexec(myRegEx, DT$name, ignore.case = TRUE)
matches <- regmatches(DT$name, r)
# Extract the matches, convert to factors
factor <- sapply(matches, function(x) as.factor(x[[1]]))
# Add to data frame
DT$factor <- factor
This is probably a longer solution than you wanted.
Upvotes: 2
Reputation: 179418
You can vectorize this with regular expressions, e.g. by using gsub()
:
Set up the data:
strings <- c("lorem pear ipsum", "apple ipsum lorem", "lorem ipsum plum")
fruit <- c("pear", "apple", "plum")
Now create a regular expression
ptn <- paste0(".*(", paste(fruit, collapse="|"), ").*")
gsub(ptn, "\\1", strings)
[1] "pear" "apple" "plum"
The regular expression works by separating each search element with |
, embedded inside parentheses:
ptn
[1] ".*(pear|apple|plum).*"
To do this inside a data table, as per your question is then as simple as:
library(data.table)
DT <- data.table(name=strings, value=c(4, 2, 6))
DT[, factor:=gsub(ptn, "\\1", strings)]
DT
name value factor
1: lorem pear ipsum 4 pear
2: apple ipsum lorem 2 apple
3: lorem ipsum plum 6 plum
Upvotes: 6
Reputation: 193517
I don't know if there is a more "data.table" way to do it, but you can try this:
DT[, factor := sapply(Fruits, function(x) Fruits[grep(x, name, ignore.case=TRUE)])]
DT
# name value factor
# 1: lorem pear ipsum 4 pear
# 2: apple ipsum lorem 2 apple
# 3: lorem ipsum plum 6 plum
Upvotes: 5