Reputation: 935
Example using mtcars:
data(mtcars)
mtcars$car <- row.names (mtcars)
In car column: I have car names listed as "Mazda RX4", "Mazda RX4 Wag", "Datsun 710", "Hornet 4 Drive", etc. Assume I wanted to remove the model of the car and only leave the manufacturer e.g. "Mazda", "Datsun", "Hornet", also assume that the names are not formated to have manufacturer always as the first word, so I can also have a name for the car as "ModelX Mazda", or "model Tesla XX", so I cant extract manufacturer as the first word of a string.
How would you go about this task, if you had a string that contained all the manufacturers names c("Mazda", "Datsun", "Hornet")?
Upvotes: 2
Views: 530
Reputation: 28675
You could use the fuzzyjoin package and do a regex_left_join
to_match <- c("Mazda", "Datsun", "Hornet")
library(tidyverse)
df <-
mtcars %>%
rownames_to_column('car')
library(fuzzyjoin)
df %>%
regex_left_join(tibble(to_match), by = c('car' = 'to_match')) %>%
select(car, to_match) %>%
head
#> car to_match
#> 1 Mazda RX4 Mazda
#> 2 Mazda RX4 Wag Mazda
#> 3 Datsun 710 Datsun
#> 4 Hornet 4 Drive Hornet
#> 5 Hornet Sportabout Hornet
#> 6 Valiant <NA>
Created on 2021-05-16 by the reprex package (v2.0.0)
Upvotes: 1
Reputation: 79198
You could also use str_extract
as shown below:
vec <- c("Mazda", "Datsun", "Hornet")
str_extract(mtcars$car, str_c(v, collapse = '|'))
Of course if you feel that the model for a given car manufacturer might contain a different car manufacturer, then you should wrap the pattern with boundary. ie
str_extract(mtcars$car, sprintf("\\b(%s)\\b", str_c(v, collapse = '|')))
Upvotes: 1
Reputation: 886948
If there is string of patterns, we can create a single string by collapsing with paste
v1 <- c("Mazda", "Datsun", "Hornet")
pat <- paste0(".*\\b(", paste(v1, collapse="|"), ")\\b.*")
then use sub
and capture those patterns as a group
mtcars$car[2] <- "RX4 Mazda Wag" # // changed for testing
out <- sub(pat, "\\1", mtcars$car)
head(out, 5)
#[1] "Mazda" "Mazda" "Datsun" "Hornet" "Hornet"
Or using dplyr
library(dplyr)
library(stringr)
mtcars <- mtcars %>%
mutate(car = str_replace(car, pat, '\\1'))
Upvotes: 2