Reputation: 177
Let's consider a df
with two columns word
and stem
. I want to create a new column that checks whether the value in stem
is entailed in word
and whether it is preceded or succeeded by some more characters. The final result should look like this:
WORD STEM NEW
rerun run prefixed
runner run suffixed
run run none
... ... ...
And below you can see my code so far. However, it does not work because the grepl
expression is applied on all rows of the df
. Anyways, I think it should make clear my idea.
df$new <- ifelse(grepl(paste0('.+', df$stem, '.+'), df$word), 'both',
ifelse(grepl(paste0(df$stem, '.+'), df$word), 'suffixed',
ifelse(grepl(paste0('.+', df$stem), df$word), 'prefixed','none')))
Upvotes: 3
Views: 921
Reputation: 9247
You can create the new
column like this
df$new <- ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both')))
Or, in you are in a dplyr
pipeline and you want to avoid all the annoying df$
df %>%
mutate(new = ifelse(startsWith(df$word, df$stem) & endsWith(df$word, df$stem), 'none',
ifelse(startsWith(df$word, df$stem), 'suffixed',
ifelse(endsWith(df$word, df$stem), 'prefixed',
'both'))))
Output
# word stem new1
# 1 rerun run prefixed
# 2 runner run suffixed
# 3 run run none
# 4 aruna run both
Upvotes: 1
Reputation: 39657
You can use mapply
to use grepl
per line like:
ifelse(mapply(grepl, paste0(".+", x$STEM, ".+"), x$WORD), "both",
ifelse(mapply(grepl, paste0(x$STEM, ".+"), x$WORD), "suffixed",
ifelse(mapply(grepl, paste0(".+", x$STEM), x$WORD), "prefixed", "none")))
#"prefixed" "suffixed" "none"
Or using startsWith
and endsWith
and use subseting form vector:
c("none", "both", "prefixed", "suffixed")[1 + (1 + startsWith(x$WORD, x$STEM) +
2*endsWith(x$WORD, x$STEM)) * (nchar(x$WORD) > nchar(x$STEM) &
mapply(grepl, x$STEM, x$WORD))]
#[1] "suffixed" "prefixed" "none"
Upvotes: 2
Reputation: 24790
Here's an approach with str_locate
from stringr
and dplyr
:
library(dplyr)
library(stringr)
data %>%
mutate_at(vars(WORD,STEM), as.character) %>%
mutate(NEW =
case_when(str_locate(WORD,STEM)[,"start"] > 1 &
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "both",
str_locate(WORD,STEM)[,"start"] > 1 ~ "prefixed",
str_locate(WORD,STEM)[,"end"] < nchar(WORD) ~ "suffixed",
TRUE ~ "none"))
WORD STEM NEW
1 rerun run prefixed
2 runner run suffixed
3 run run none
I added a line to convert WORD
and STEM
to character in case they were factors.
Upvotes: 1