Reputation: 107
Here is the example database that I have.
column1 Column2
STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
I have to match the value's from column1 to the string in column2 & if the exact match is in found in the column2 string, the matching value needs to be removed from the string.
Eg desired output:
column1 Column2
STELLARN714WPUR STELLARN594WPUR,STELLARN814WPUR
STELLARN714WRED STELLARN594WRED,STELLARN814WRED,
STELLARN814WRED STELLARN594WRED,STELLARN714WRED,
I tried stringer and gsub, but they didn't help. Any help would be highly appreciated. Thank you.
Upvotes: 0
Views: 419
Reputation: 188
You can use str_extract
and gsub
. I also added a scenario in row 4 where column 1 value doesn't match with column 2. You can find the final output in column 3.
library(stringr)
library(dplyr)
col1 <- c("STELLARN714WPUR", "STELLARN714WRED", "STELLARN814WRED", "AB")
col2 <- c("STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR", "STELLARN594WRED,STELLARN814WRED,STELLARN714WRED", "STELLARN594WRED,STELLARN714WRED,STELLARN814WRED", "STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR")
df <- data.frame(column1 = col1, Column2 = col2, stringsAsFactors = FALSE)
df
column1 Column2
1 STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
2 STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
3 STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
4 AB STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
df %>%
mutate(match_val = str_extract(Column2, column1),
Column3 = ifelse(is.na(match_val), Column2,
str_replace(Column2, paste0(match_val,",|, ",match_val), "")))
column1 Column2 match_val
1 STELLARN714WPUR STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR STELLARN714WPUR
2 STELLARN714WRED STELLARN594WRED,STELLARN814WRED,STELLARN714WRED STELLARN714WRED
3 STELLARN814WRED STELLARN594WRED,STELLARN714WRED,STELLARN814WRED STELLARN814WRED
4 AB STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR <NA>
Column3
1 STELLARN594WPUR,STELLARN814WPUR
2 STELLARN594WRED,STELLARN814WRED,STELLARN714WRED
3 STELLARN594WRED,STELLARN714WRED,STELLARN814WRED
4 STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR
Upvotes: 1
Reputation: 389355
You can use str_remove
:
library(stringr)
str_remove(df$Column2, df$column1)
#[1] "STELLARN594WPUR,,STELLARN814WPUR" "STELLARN594WRED,STELLARN814WRED,"
#[3] "STELLARN594WRED,STELLARN714WRED,"
To clean up the string from additional commas we can use gsub
and trimws
.
gsub(',{1,}', ',', trimws(str_remove(df$Column2, df$column1), whitespace = ','))
#[1] "STELLARN594WPUR,STELLARN814WPUR" "STELLARN594WRED,STELLARN814WRED"
#[3] "STELLARN594WRED,STELLARN714WRED"
Another option in base R is to split the string on comma and select the string which is not present in column1
.
df$Column2 <- mapply(function(x, y) toString(setdiff(x, y)),
strsplit(df$Column2, ','), df$column1)
df
# column1 Column2
#1 STELLARN714WPUR STELLARN594WPUR, STELLARN814WPUR
#2 STELLARN714WRED STELLARN594WRED, STELLARN814WRED
#3 STELLARN814WRED STELLARN594WRED, STELLARN714WRED
data
df <- structure(list(column1 = c("STELLARN714WPUR", "STELLARN714WRED",
"STELLARN814WRED"), Column2 = c("STELLARN594WPUR,STELLARN714WPUR,STELLARN814WPUR",
"STELLARN594WRED,STELLARN814WRED,STELLARN714WRED", "STELLARN594WRED,STELLARN714WRED,STELLARN814WRED")),
class = "data.frame", row.names = c(NA, -3L))
Upvotes: 2