Karthik S
Karthik S

Reputation: 11548

How to check and remove if string in one column matches with string in another column in R

I have a large dataframe. Sample Data:

> df <- data.frame(MAKE = c('FORD','FORD','FORD','BMW','BMW'),
+                    MODEL = c('ECO SPORT','ECO SPORT','ECO SPORT','3 SERIES','5 SERIES'),
+                  VARIANT = c('ECOSPORT 1.0','ECOSPORT 1.5','ECOSPORT 1.5','E90','5 SERIES F(10)'),
                   stringsAsFactors = 0)
> 
> df
  MAKE     MODEL        VARIANT
1 FORD ECO SPORT   ECOSPORT 1.0
2 FORD ECO SPORT   ECOSPORT 1.5
3 FORD ECO SPORT   ECOSPORT 1.5
4  BMW  3 SERIES            E90
5  BMW  5 SERIES   5 SERIES F(10)
> 

I need to find and remove the strings in "VARIANT" column that are there in "MODEL" column. I initially tried to identify the rows first using below command but it doesn't work.

> df[df$MODEL %in% df$VARIANT,]
[1] MAKE    MODEL   VARIANT
<0 rows> (or 0-length row.names)
> 

Could anyone let me know how to accomplish the same. Expected Output:

> df
  MAKE     MODEL        VARIANT
1 FORD  ECO SPORT         1.0
2 FORD  ECO SPORT         1.5
3 FORD  ECO SPORT         1.5
4  BMW   3 SERIES         E90
5  BMW   5 SERIES         F(10)
> 

Upvotes: 1

Views: 68

Answers (1)

akrun
akrun

Reputation: 887991

We could use str_remove

library(dplyr)
library(stringr)   
df %>%
    mutate_if(is.factor, as.character) %>%
    mutate(VARIANT  = str_remove_all(VARIANT,
              str_remove(MODEL, '(?<=[A-Z]) (?=[A-Z])')))
#  MAKE     MODEL VARIANT
#1 FORD ECO SPORT     1.0
#2 FORD ECO SPORT     1.5
#3 FORD ECO SPORT     1.5
#4  BMW  3 SERIES     E90
#5  BMW  5 SERIES   F(10)

Upvotes: 1

Related Questions