Reputation: 11548
I have a large dataframe. Sample Data:
> df <- data.frame(MAKE = c('FORD','FORD','FORD','BMW','BMW'),
+ MODEL = c('ECO SPORT','ECO SPORT','ECO SPORT','3 SERIES','5 SERIES'),
+ VARIANT = c('ECOSPORT 1.0','ECOSPORT 1.5','ECOSPORT 1.5','E90','5 SERIES F(10)'),
stringsAsFactors = 0)
>
> df
MAKE MODEL VARIANT
1 FORD ECO SPORT ECOSPORT 1.0
2 FORD ECO SPORT ECOSPORT 1.5
3 FORD ECO SPORT ECOSPORT 1.5
4 BMW 3 SERIES E90
5 BMW 5 SERIES 5 SERIES F(10)
>
I need to find and remove the strings in "VARIANT" column that are there in "MODEL" column. I initially tried to identify the rows first using below command but it doesn't work.
> df[df$MODEL %in% df$VARIANT,]
[1] MAKE MODEL VARIANT
<0 rows> (or 0-length row.names)
>
Could anyone let me know how to accomplish the same. Expected Output:
> df
MAKE MODEL VARIANT
1 FORD ECO SPORT 1.0
2 FORD ECO SPORT 1.5
3 FORD ECO SPORT 1.5
4 BMW 3 SERIES E90
5 BMW 5 SERIES F(10)
>
Upvotes: 1
Views: 68
Reputation: 887991
We could use str_remove
library(dplyr)
library(stringr)
df %>%
mutate_if(is.factor, as.character) %>%
mutate(VARIANT = str_remove_all(VARIANT,
str_remove(MODEL, '(?<=[A-Z]) (?=[A-Z])')))
# MAKE MODEL VARIANT
#1 FORD ECO SPORT 1.0
#2 FORD ECO SPORT 1.5
#3 FORD ECO SPORT 1.5
#4 BMW 3 SERIES E90
#5 BMW 5 SERIES F(10)
Upvotes: 1