Reputation: 371
Is there a more elegant solution for the bottom code? Basically, I want to strsplit on a vector of characters. I want to know if there is a better solution such as with using %in% or something else.
data_d <- data.frame(id = c('A', 'B', 'C'),
sentence = c('1. this is A sentence',
'2. this is B sentence',
'3. this is C sentence'),
stringsAsFactors = F)
listasd <- c('A', 'B', 'C')
data_d$first <- NA
for (i in listasd)
data_d$first <- ifelse(str_detect(data_d$sentence, i),
sapply(strsplit(data_d$sentence, i), "[", 1),
data_d$first)
Upvotes: 1
Views: 71
Reputation: 703
Maybe consider using the stringi
package?
So maybe a little more elegant solution:
listasd <- c('C', 'A', 'B')
stri_split_regex(data_d$sentence, stri_paste(listasd, collapse="|"), n=2, simplify = TRUE)[,1]
It returns a vector of interesting parts of sentences without using sapply
:
[1] "1. this is " "2. this is " "3. this is "
So you can make a solution without a loop, which is extremely slow in R:
data_d$first <- stri_split_regex(data_d$sentence, stri_paste(listasd, collapse="|"), n=2, simplify = TRUE)[,1]
Upvotes: 1
Reputation: 3728
This gives the same output:
sapply(strsplit(data_d$sentence, c('A','B','C')),'[',1)
# [1] "1. this is " "2. this is " "3. this is "
According to ?split
, the split
argument can take character vector which are recycled along x
.
If you try:
sapply(strsplit(data_d$sentence, c('C','B','A')),'[',1)
# "1. this is A sentence" "2. this is " "3. this is C sentence"
still works as there is nothing to split in the 1st and 3rd string.
Upvotes: 0
Reputation: 43334
You can just use gsub
. The regex finds from a capital letter to the end of the line. If you have other capitals in your sentence, you'll need to adjust it.
data_d$first <- gsub('[A-Z].*$', '', data_d$sentence)
> data_d
id sentence first
1 A 1. this is A sentence 1. this is
2 B 2. this is B sentence 2. this is
3 C 3. this is C sentence 3. this is
Upvotes: 0