Reputation: 12856
I have a data frame with two columns, and want to create a third column which will essentially be a boolean for whether or not column two contain a certain set of specified values.
f <- data.frame(name=c("John", "Sara", "David", "Chad"),
car=c("Honda|Ford", "BMW", "Toyota|Chevy|Ford",
"Toyota|Chevy|Ford|Honda"))
The first thing I did was remove the | from each string in the second column, and placed those valued in a third column
library(stringr)
g = str_replace_all(f$car, "[^[:alnum:]]", " ")
f$make = c(g)
f
What I want to do now if create another column, which will be a boolean, 1 if make contains a common car, and 0 if it contains a not common car.
common = c("Honda", "Ford", "Toyota", "Chevy")
not_common = c("BMW", "Lexus", "Acura")
I've tried a few things, including the stringr package and ifelse to produce the following output.
name car make common
1 John Honda|Ford Honda Ford 1
2 Sara BMW BMW 0
3 David Toyota|Chevy|Ford Toyota Chevy Ford 1
4 Chad Toyota|Chevy|Ford|Honda Toyota Chevy Ford Honda 1
Since it's possible to have both a common and uncommon car as an entry, the uncommon make should override the common make and that row should take the value 0 in the common column. So if an entry had both BMW and Ford, that entry should take a 0 in the common column.
Can anyone help with this task.
Oh, and here's what I tried with the stringr package, but it doesn't work.
common = c("Honda", "Ford", "Toyota", "Chevy")
not_common = c("BMW", "Lexus", "Acura")
common_match <- str_c(common)
not_match <- str_c(not_common)
main <- function(df) {
f$new_make <- str_detect(f$make, common_match)
df
}
main(f)
Thanks!
Upvotes: 0
Views: 2208
Reputation: 48191
Another way and a comparison
f2 <- f[rep(1:4,50000),]
system.time({
v <- sapply(f2$make, strsplit, " ")
sapply(v, function(x) max(1-not_common %in% x)*max(common %in% x))
})
user system elapsed
7.94 0.01 8.00
system.time(sapply(f2$car,function(x) ifelse(length(grep("BMW|Lexus|Acura",x))>0,0,1)))
user system elapsed
28.72 0.04 28.87
Upvotes: 2
Reputation: 93803
Not sure if this is the most efficient way, but try this one using grep
and ifelse
applied to each value of f$car
. The |
characters just mean or
for combining search terms inside grep
and have nothing to do with the separator in your data.
f$common <- sapply(f$car,function(x) ifelse(length(grep("BMW|Lexus|Acura",x))>0,0,1))
Result:
> f
name car common
1 John Honda|Ford 1
2 Sara BMW 0
3 David Toyota|Chevy|Ford 1
4 Chad Toyota|Chevy|Ford|Honda 1
Upvotes: 2