Pavel Shliaha
Pavel Shliaha

Reputation: 935

How to replace values at certain indices in a column based on presence of a string in values of that column (using dplyr and repeatedly with no loop)?

Example using mtcars:

data(mtcars)
mtcars$car <- row.names (mtcars)

In car column: I have car names listed as "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", etc. Assume I wanted to remove the model of the car and only leave the manufacturer e.g. "Mazda", "Datsun", "Hornet", also assume that the names are not formated to have manufacturer always as the first word, so I can also have a name for the car as "ModelX Mazda", or "model Tesla XX", so I cant extract manufacturer as the first word of a string.

How would you go about this task, if you had a string that contained all the manufacturers names c("Mazda", "Datsun", "Hornet")?

Upvotes: 2

Views: 530

Answers (3)

IceCreamToucan
IceCreamToucan

Reputation: 28675

You could use the fuzzyjoin package and do a regex_left_join

to_match <- c("Mazda", "Datsun", "Hornet")

library(tidyverse)

df <- 
  mtcars %>% 
    rownames_to_column('car')

library(fuzzyjoin)

df %>% 
  regex_left_join(tibble(to_match), by = c('car' = 'to_match')) %>% 
  select(car, to_match) %>% 
  head
#>                 car to_match
#> 1         Mazda RX4    Mazda
#> 2     Mazda RX4 Wag    Mazda
#> 3        Datsun 710   Datsun
#> 4    Hornet 4 Drive   Hornet
#> 5 Hornet Sportabout   Hornet
#> 6           Valiant     <NA>

Created on 2021-05-16 by the reprex package (v2.0.0)

Upvotes: 1

Onyambu
Onyambu

Reputation: 79198

You could also use str_extract as shown below:

vec <- c("Mazda", "Datsun", "Hornet")

str_extract(mtcars$car, str_c(v, collapse = '|'))

Of course if you feel that the model for a given car manufacturer might contain a different car manufacturer, then you should wrap the pattern with boundary. ie

str_extract(mtcars$car, sprintf("\\b(%s)\\b", str_c(v, collapse = '|')))

Upvotes: 1

akrun
akrun

Reputation: 886948

If there is string of patterns, we can create a single string by collapsing with paste

v1 <- c("Mazda", "Datsun", "Hornet")
pat <- paste0(".*\\b(", paste(v1, collapse="|"), ")\\b.*")

then use sub and capture those patterns as a group

mtcars$car[2] <- "RX4 Mazda Wag" # // changed for testing
out <- sub(pat, "\\1", mtcars$car)
head(out, 5)
#[1] "Mazda"  "Mazda"  "Datsun" "Hornet" "Hornet"

Or using dplyr

library(dplyr)
library(stringr)
mtcars <- mtcars %>%
       mutate(car = str_replace(car, pat, '\\1'))

Upvotes: 2

Related Questions