Jan
Jan

Reputation: 65

In R: Searching a column for different string patterns and replace all of them

I have a column with different game titles. In order to collect them, I have to change all of them to a singluar spelling. For example, I have:

str_replace_all(FavouriteGames_DF$FavGame1, pattern = c("SKYRIM|
                                          THE ELDER SCROLLS V: SKYRIM|
                                          ELDER SCROLLS SKYRIM|
                                          ELDER SCROLLS V SKYRIM|
                                          SKYRIM (BETHESDA 2011)|
                                          SKYRIM (MODDED)|
                                          THE ELDERSCROLLS V: SKYRIM"), 
            replacement = "THE ELDER SCROLLS 5: SKYRIM")

The problem is, that str_replace_all is kinda bad for this, as it can't just search for any matching pattern and replace it with the replacement, but apparently has to go through it in order and I can't predict where in the DataSet which term will arrive. I do not want the function to replace incomplete matches (ie., turning "The ELDERSCROLLS V: SKYRIM" to THE ELDERSCOLLS V: THE ELDER SCROLL 5: Skyrim") Putting the patterns into pattern = c("1", "2") it will not work at all, because it can only check for the patterns in order.

I also tried the FindReplace function from the DataCombine package, but that one doesn't seem to work either for reasons I do not quite understand (claiming I am missing dimensions and the vector not being a character vector). Anyway, I want to use as few packages as possible and would prefer to stay in the tidyverse.

Does anybody have a good solution? I do not want to search for each term on it's own as I have to do this a lot and I already have to do it for 6 columns as mutate_at doesn_t seem to work with str_replace.

Thanks!

Upvotes: 0

Views: 292

Answers (2)

NiklasvMoers
NiklasvMoers

Reputation: 329

My comment as an answer:

FavouriteGames_DF[FavouriteGames_Df$FavGame1 %in% pattern, ]$FavGame1 <- replacement

Upvotes: 1

eduardokapp
eduardokapp

Reputation: 1751

A handy solution would be to just use "SKYRIM" as a pattern, as it is the common word on all the patterns you specified. You could define a very simple function to check for that pattern and then use lapply on the specific column you want to check for:

check <- function(x){
    y <- unlist(strsplit(x, " "))
    if("SKYRIM" %in% y)
        return("THE ELDER SCROLLS 5: SKYRIM")
    else
        return(x)
}

FavouriteGames_DF["FavGame1"] <- lapply(FavouriteGames_DF["FavGame1"], check)

Upvotes: 1

Related Questions