Reputation: 23
I am trying to take a list of strings, remove everything except capital letters, and output a list of strings without any spaces or breaks.
Unfortunately, I have been trying to use str_extract_all() but it outputs the relevent pieces of the string separated as a list of character vectors, when there was non-capital letter string elements contained in the original string.
Can anyone please suggest a way to get the desired output?
# Some example data:
a <- list("n[28.0313]MVNNGHSFNVEYDDSQDK[28.0313]AVLK[28.0313]D_+4",
"SLGKVGTRC[71.0371]CTK[28.0313]PESER_+4",
"n[28.0313]AVVQDPALK[28.0313]PLALVY_+3",
"n[28.0313]TCVADESHAGC[71.0371]EK[28.0313]_+2")
# The desired output:
list("MVNNGHSFNVEYDDSQDKAVLKD",
"SLGKVGTRCCTKPESER",
"AVVQDPALKPLALVY",
"TCVADESHAGCEK")
# What I've tried so far:
a %>% str_extract_all("[A-Z]+")
[[1]]
[1] "MVNNGHSFNVEYDDSQDK" "AVLK" "D"
[[2]]
[1] "SLGKVGTRC" "CTK" "PESER"
[[3]]
[1] "AVVQDPALK" "PLALVY"
[[4]]
[1] "TCVADESHAGC" "EK"
# Not what I want.
I need to find a way to isolate the strings and combine them, but I'm at the limit of my R knowledge.
Upvotes: 0
Views: 27
Reputation: 887028
As it is a list
of multiple elements, we can just paste it together by looping over the list
library(dplyr)
library(stringr)
library(purrr)
a %>%
str_extract_all("[A-Z]+") %>%
map_chr(str_c, collapse="")
-output
[1] "MVNNGHSFNVEYDDSQDKAVLKD" "SLGKVGTRCCTKPESER"
[3] "AVVQDPALKPLALVY" "TCVADESHAGCEK"
Or just use gsub
to match all characters other than the upper case and replace with blank
gsub("[^A-Z]+", "", a)
[1] "MVNNGHSFNVEYDDSQDKAVLKD" "SLGKVGTRCCTKPESER" "AVVQDPALKPLALVY" "TCVADESHAGCEK"
or with str_remove_all
str_remove_all(a, "[^A-Z]+")
[1] "MVNNGHSFNVEYDDSQDKAVLKD" "SLGKVGTRCCTKPESER" "AVVQDPALKPLALVY" "TCVADESHAGCEK"
The output is a vector
, which we can wrap it in a list
list(str_remove_all(a, "[^A-Z]+"))
Upvotes: 1