BroQ
BroQ

Reputation: 101

Mutate changes full column instead of row by row

In a dataframe, I want to create a new column based on the occurence of a specific set of strings (char vector) in another column.

So basically, I want this:

ID  Phrases
1   some words
2   some words dec
3   some words nov may

to return this:

ID  Phrases             MonthsOccur
1   some words          NA
2   some words dec      dec
3   some words nov may  may nov

I have tried the following, and I'm not sure why it's giving me the outcome that it does:

library(dplyr)

vMonths <- c("jan","feb","mar","apr","may","jun","jul","aug","sept","nov","dec")

a <- c(1,2,3)
b <- c('phrase number one', 'phrase dec','phrase nov')

df <- data.frame(a,b)
names(df) <- c("ID","Phrases")
df <- df %>% mutate(MonthsOccur = paste(vMonths[str_detect(Phrases, vMonths)],collapse=" "))

It gives me the following warning:

Warning message: In stri_detect_regex(string, pattern, negate = negate, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length

And the following outcome:

ID  Phrases             MonthsOccur
1   some words          dec
2   some words dec      dec
3   some words nov may  dec

Upvotes: 4

Views: 279

Answers (2)

tmfmnk
tmfmnk

Reputation: 39858

Another option involving dplyr and stringr could be:

df %>%
 mutate(MonthsOccur = str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")))

  ID            Phrases MonthsOccur
1  1         some words            
2  2     some words dec         dec
3  3 some words nov may    nov, may

The output here is not a character vector, but a list.

If you are indeed looking for a character vector, then with the addition of purrr:

df %>%
 mutate(MonthsOccur = map_chr(str_extract_all(Phrases, paste(tolower(month.abb), collapse = "|")), 
                              paste, collapse = ", "))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388907

One option is to apply str_detect rowwise

library(dplyr)
library(stringr)

df %>%
  rowwise() %>%
  mutate(MonthsOccur = paste0(vMonths[str_detect(Phrases, vMonths)], 
                       collapse = " "))

However, rowwise may or may not be continued in the future so a better way is to use map operations

df %>%
  mutate(MonthsOccur = purrr::map_chr(Phrases,  
                      ~paste0(vMonths[str_detect(.x, vMonths)], collapse = " ")))

#  ID           Phrases MonthsOccur
#1  1 phrase number one            
#2  2        phrase dec         dec
#3  3    phrase nov may     may nov

A base R option would be with regmatches and gregexpr

sapply(regmatches(df$Phrases, gregexpr(paste0(vMonths, collapse = "|"),
        df$Phrases)), paste0, collapse = " ")

data

df <- structure(list(ID = c(1, 2, 3), Phrases = structure(c(3L, 1L, 
2L), .Label = c("phrase dec", "phrase nov may", "phrase number one"
), class = "factor")), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 3

Related Questions