marreBUS
marreBUS

Reputation: 124

Find matching string in list and only keep the matching string

I have a list in R with multiple strings that I need to match with my column in a data frame and keep only the matching strings.

list <- c('Ford', 'Toyota', 'BMW')

Col1         Col2         
1            Ford A1
2            Toyota Prius
3            BMW B2
4            Ford A2
5            Tesla T1

So I want to match Col2 with list and then change the data to:

Col1         Col2         
1            Ford
2            Toyota
3            BMW
4            Ford
5            Tesla T1

Upvotes: 0

Views: 1056

Answers (1)

rosscova
rosscova

Reputation: 5580

You can use your list to create a regex string, which can then be used in a sub call:

regex.string <- paste0( ".*(", paste( list, collapse = "|" ), ").*" )

This makes the string:

> regex.string
[1] ".*(Ford|Toyota|BMW).*"

Now use that in a sub call:

df$Col2 <- sub( regex.string, "\\1", df$Col2 )

So the regex looks for any value contained in list, if it's found, it replaces the entire text value with what was found.

Result:

> df
  Col1     Col2
1    1     Ford
2    2   Toyota
3    3      BMW
4    4     Ford
5    5 Tesla T1

NOTE: as mentioned below, this will likely break for car makes containing special regex characters.

Upvotes: 1

Related Questions