Reputation: 301
I have string sequences of "BY","SN","SY" and "BN" There are multiple instance as seen in the table below. I want to reduce "SNSNSNBY" to "SNBY" and "SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN" to "SNBY"
SNo Bars
<dbl> <chr>
1 1 SNSNSNBY
2 2 SNBYSN
3 3 BYSN
4 4 SNBY
5 5 SNBY
6 6 SNBY
7 7 BYBYSNSN
8 8 SNBY
9 9 BYSN
10 10 BYSN
11 11 BYSN
12 12 SNBY
13 13 SNBY
14 14 BNSY
15 15 SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN
16 16 SYBN
17 17 BNSYBN
18 18 BNSYBNSYBNSNBNBNBNBN
19 19 SNBYSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSN
20 20 BYSN
Upvotes: 1
Views: 209
Reputation: 1816
One possible way to approach this is to strsplit()
every second character then find the unique sequences and paste()
.
Code
# Add a whitespace every 2 characters and then split into substrings
tmp1 <- strsplit(gsub("(.{2})", "\\1 ", df$Bars), " ")
# Obtain the unique substrings and paste
df$Bars <- sapply(tmp1, function(x){
paste0(unique(x), collapse = "")
})
df
# SNo Bars
# 1 1 SNBY
# 2 2 SNBY
# 3 3 BYSN
# 4 4 SNBY
# 5 5 SNBY
# 6 6 SNBY
# 7 7 BYSN
# 8 8 SNBY
# 9 9 BYSN
# 10 10 BYSN
# 11 11 BYSN
# 12 12 SNBY
# 13 13 SNBY
# 14 14 BNSY
# 15 15 SNBY
# 16 16 SYBN
# 17 17 BNSY
# 18 18 BYSYBNSN
# 19 19 SNBY
# 20 20 BYSN
Data
df <- read.table(text = " SNo Bars
1 1 SNSNSNBY
2 2 SNBYSN
3 3 BYSN
4 4 SNBY
5 5 SNBY
6 6 SNBY
7 7 BYBYSNSN
8 8 SNBY
9 9 BYSN
10 10 BYSN
11 11 BYSN
12 12 SNBY
13 13 SNBY
14 14 BNSY
15 15 SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN
16 16 SYBN
17 17 BNSYBN
18 18 BYSYBNSNBNSNBNBNBNBN
19 19 SNBYSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSN
20 20 BYSN", header = TRUE)
Upvotes: 3
Reputation: 78917
You can use case_when()
library(tidyverse)
df1 <- df %>%
mutate(V3 = case_when(V3 == "SNSNSNBY" ~ "SNBY",
V3 == "SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN" ~ "SNBY",
TRUE ~ V3)
)
Upvotes: 0
Reputation: 44
One handy solution is to use str_replace_all() function from tidyverse core library stringr :
table<- table %>%
mutate(Bars=str_replace_all(Bars, c("SNSNSNBY"="SNBY",
"SNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNSNBYSN"="SNBY"))
Upvotes: 1