Reputation: 101
In a dataframe, I want to create a new column based on the occurence of a specific set of strings (char vector) in another column.
So basically, I want this:
ID Phrases
1 some words
2 some words dec
3 some words nov may
to return this:
ID Phrases MonthsOccur
1 some words NA
2 some words dec dec
3 some words nov may may nov
I have tried the following, and I'm not sure why it's giving me the outcome that it does:
library(dplyr)
vMonths <- c("jan","feb","mar","apr","may","jun","jul","aug","sept","nov","dec")
a <- c(1,2,3)
b <- c('phrase number one', 'phrase dec','phrase nov')
df <- data.frame(a,b)
names(df) <- c("ID","Phrases")
df <- df %>% mutate(MonthsOccur = paste(vMonths[str_detect(Phrases, vMonths)],collapse=" "))
It gives me the following warning:
Warning message: In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length
And the following outcome:
ID Phrases MonthsOccur
1 some words dec
2 some words dec dec
3 some words nov may dec
Upvotes: 4
Views: 279
Reputation: 39858
Another option involving dplyr
and stringr
could be:
df %>%
mutate(MonthsOccur = str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")))
ID Phrases MonthsOccur
1 1 some words
2 2 some words dec dec
3 3 some words nov may nov, may
The output here is not a character vector, but a list.
If you are indeed looking for a character vector, then with the addition of purrr
:
df %>%
mutate(MonthsOccur = map_chr(str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")),
paste, collapse = ", "))
Upvotes: 0
Reputation: 388907
One option is to apply str_detect
rowwise
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(MonthsOccur = paste0(vMonths[str_detect(Phrases, vMonths)],
collapse = " "))
However, rowwise
may or may not be continued in the future so a better way is to use map
operations
df %>%
mutate(MonthsOccur = purrr::map_chr(Phrases,
~paste0(vMonths[str_detect(.x, vMonths)], collapse = " ")))
# ID Phrases MonthsOccur
#1 1 phrase number one
#2 2 phrase dec dec
#3 3 phrase nov may may nov
A base R option would be with regmatches
and gregexpr
sapply(regmatches(df$Phrases, gregexpr(paste0(vMonths, collapse = "|"),
df$Phrases)), paste0, collapse = " ")
data
df <- structure(list(ID = c(1, 2, 3), Phrases = structure(c(3L, 1L,
2L), .Label = c("phrase dec", "phrase nov may", "phrase number one"
), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
Upvotes: 3