Reputation: 124
I have a list in R with multiple strings that I need to match with my column in a data frame and keep only the matching strings.
list <- c('Ford', 'Toyota', 'BMW')
Col1 Col2
1 Ford A1
2 Toyota Prius
3 BMW B2
4 Ford A2
5 Tesla T1
So I want to match Col2 with list and then change the data to:
Col1 Col2
1 Ford
2 Toyota
3 BMW
4 Ford
5 Tesla T1
Upvotes: 0
Views: 1056
Reputation: 5580
You can use your list to create a regex string, which can then be used in a sub
call:
regex.string <- paste0( ".*(", paste( list, collapse = "|" ), ").*" )
This makes the string:
> regex.string
[1] ".*(Ford|Toyota|BMW).*"
Now use that in a sub call:
df$Col2 <- sub( regex.string, "\\1", df$Col2 )
So the regex looks for any value contained in list
, if it's found, it replaces the entire text value with what was found.
Result:
> df
Col1 Col2
1 1 Ford
2 2 Toyota
3 3 BMW
4 4 Ford
5 5 Tesla T1
NOTE: as mentioned below, this will likely break for car makes containing special regex characters.
Upvotes: 1