shoonya
shoonya

Reputation: 301

Remove duplicate sub-strings in R

I have string sequences of "BY","SN","SY" and "BN" There are multiple instance as seen in the table below. I want to reduce "SNSNSNBY" to "SNBY" and "SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN" to "SNBY"

   SNo Bars                                                 
    <dbl> <chr>                                                       
 1      1 SNSNSNBY                                                    
 2      2 SNBYSN                                                      
 3      3 BYSN                                                        
 4      4 SNBY                                                        
 5      5 SNBY                                                        
 6      6 SNBY                                                        
 7      7 BYBYSNSN                                                    
 8      8 SNBY                                                        
 9      9 BYSN                                                        
10     10 BYSN                                                        
11     11 BYSN                                                        
12     12 SNBY                                                        
13     13 SNBY                                                        
14     14 BNSY                                                        
15     15 SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN
16     16 SYBN                                                        
17     17 BNSYBN                                                      
18     18 BNSYBNSYBNSNBNBNBNBN                                        
19     19 SNBYSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSN      
20     20 BYSN           

Upvotes: 1

Views: 209

Answers (3)

fabla
fabla

Reputation: 1816

One possible way to approach this is to strsplit() every second character then find the unique sequences and paste().

Code

# Add a whitespace every 2 characters and then split into substrings
tmp1 <- strsplit(gsub("(.{2})", "\\1 ", df$Bars), " ")

# Obtain the unique substrings and paste
df$Bars <- sapply(tmp1, function(x){
  paste0(unique(x), collapse = "")
})

df

#    SNo     Bars
# 1    1     SNBY
# 2    2     SNBY
# 3    3     BYSN
# 4    4     SNBY
# 5    5     SNBY
# 6    6     SNBY
# 7    7     BYSN
# 8    8     SNBY
# 9    9     BYSN
# 10  10     BYSN
# 11  11     BYSN
# 12  12     SNBY
# 13  13     SNBY
# 14  14     BNSY
# 15  15     SNBY
# 16  16     SYBN
# 17  17     BNSY
# 18  18 BYSYBNSN
# 19  19     SNBY
# 20  20     BYSN

Data

df <- read.table(text = " SNo Bars 
 1      1 SNSNSNBY                                                    
 2      2 SNBYSN                                                      
 3      3 BYSN                                                        
 4      4 SNBY                                                        
 5      5 SNBY                                                        
 6      6 SNBY                                                        
 7      7 BYBYSNSN                                                    
 8      8 SNBY                                                        
 9      9 BYSN                                                        
10     10 BYSN                                                        
11     11 BYSN                                                        
12     12 SNBY                                                        
13     13 SNBY                                                        
14     14 BNSY                                                        
15     15 SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN
16     16 SYBN                                                        
17     17 BNSYBN                                                      
18     18 BYSYBNSNBNSNBNBNBNBN                                        
19     19 SNBYSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSN      
20     20 BYSN", header = TRUE)

Upvotes: 3

TarJae
TarJae

Reputation: 78917

You can use case_when()

library(tidyverse)
df1 <- df %>% 
  mutate(V3 = case_when(V3 == "SNSNSNBY" ~ "SNBY",
                        V3 == "SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN" ~ "SNBY",
                        TRUE ~ V3)
         )

enter image description here

Upvotes: 0

Erich Zann
Erich Zann

Reputation: 44

One handy solution is to use str_replace_all() function from tidyverse core library stringr :

table<- table %>%
    mutate(Bars=str_replace_all(Bars, c("SNSNSNBY"="SNBY", 
    "SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN"="SNBY"))

Upvotes: 1

Related Questions