r_a
r_a

Reputation: 107

Matching a exact word from a column to a string in another column and remove the matching word from string of other column

Here is the example database that I have.

column1           Column2
STELLARN714WPUR   STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
STELLARN714WRED   STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
STELLARN814WRED   STELLARN594WRED,STELLARN714WRED,STELLARN814WRED

I have to match the value's from column1 to the string in column2 & if the exact match is in found in the column2 string, the matching value needs to be removed from the string.

Eg desired output:

column1           Column2
STELLARN714WPUR   STELLARN594WPUR,STELLARN814WPUR
STELLARN714WRED   STELLARN594WRED,STELLARN814WRED,
STELLARN814WRED   STELLARN594WRED,STELLARN714WRED,

I tried stringer and gsub, but they didn't help. Any help would be highly appreciated. Thank you.

Upvotes: 0

Views: 419

Answers (2)

lil_barnacle
lil_barnacle

Reputation: 188

You can use str_extract and gsub. I also added a scenario in row 4 where column 1 value doesn't match with column 2. You can find the final output in column 3.

library(stringr)
library(dplyr)

col1 <- c("STELLARN714WPUR", "STELLARN714WRED", "STELLARN814WRED", "AB")
col2 <- c("STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR", "STELLARN594WRED,STELLARN814WRED,STELLARN714WRED", "STELLARN594WRED,STELLARN714WRED,STELLARN814WRED", "STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR")

df <- data.frame(column1  = col1, Column2 = col2, stringsAsFactors = FALSE)
df
          column1                                         Column2
1 STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
2 STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
3 STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
4              AB STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR

df %>%
  mutate(match_val = str_extract(Column2, column1),
         Column3 = ifelse(is.na(match_val), Column2, 
                          str_replace(Column2, paste0(match_val,",|, ",match_val), "")))

          column1                                         Column2       match_val
1 STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR STELLARN714WPUR
2 STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED STELLARN714WRED
3 STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED STELLARN814WRED
4              AB STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR            <NA>
                                          Column3
1                 STELLARN594WPUR,STELLARN814WPUR
2 STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
3 STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
4 STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389355

You can use str_remove :

library(stringr)
str_remove(df$Column2, df$column1)
#[1] "STELLARN594WPUR,,STELLARN814WPUR" "STELLARN594WRED,STELLARN814WRED,"
#[3] "STELLARN594WRED,STELLARN714WRED,"

To clean up the string from additional commas we can use gsub and trimws.

gsub(',{1,}', ',', trimws(str_remove(df$Column2, df$column1), whitespace = ','))

#[1] "STELLARN594WPUR,STELLARN814WPUR" "STELLARN594WRED,STELLARN814WRED"
#[3] "STELLARN594WRED,STELLARN714WRED"

Another option in base R is to split the string on comma and select the string which is not present in column1.

df$Column2 <- mapply(function(x, y) toString(setdiff(x, y)), 
                     strsplit(df$Column2, ','), df$column1)

df

#          column1                          Column2
#1 STELLARN714WPUR STELLARN594WPUR, STELLARN814WPUR
#2 STELLARN714WRED STELLARN594WRED, STELLARN814WRED
#3 STELLARN814WRED STELLARN594WRED, STELLARN714WRED 

data

df <- structure(list(column1 = c("STELLARN714WPUR", "STELLARN714WRED", 
"STELLARN814WRED"), Column2 = c("STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR", 
"STELLARN594WRED,STELLARN814WRED,STELLARN714WRED", "STELLARN594WRED,STELLARN714WRED,STELLARN814WRED")), 
class = "data.frame", row.names = c(NA, -3L))

Upvotes: 2

Related Questions