WCMC
WCMC

Reputation: 1791

R: how to split string correctly when there are multiple signs

How to use R to split a string so that the following desired result can be achieved?

"A++" => "A" "" ""
"A+B+" => "A" "B" ""
"A+B+C" => "A" "B" "C"
"A++C" => "A" "" "C"
"++C" => "" "" "C"

I tried the strsplit(), the result of strsplit("A++","\\+")[[1]] is "A" "", missing one "".

Upvotes: 1

Views: 198

Answers (2)

Tanner33
Tanner33

Reputation: 140

If you always want to keep capital letters you could try the following.

x<-unlist((strsplit("A++",""[[1]])),use.names = F)

for(j in 1:length(x)){
  if(x[j] %in% LETTERS){x[j]<-x[j]}
  else{x[j]<-""}
}

You have to first use strsplit and convert from a list to a vector by using unlist. Then, simply keep anything that is a capital letter and replace all other characters with "" (R has a stored vector "LETTERS" or "letters" if you do need lowercase as well).

Upvotes: 0

tim
tim

Reputation: 901

The strsplit function from the base library is somewhat limited. It drops trailing empty strings. Try the stringr or stringi libraries. For example:

library(stringr)
str_split("A++", "\\+")

This has your required return:

[[1]]
[1] "A" ""  "" 

str_split is vectorized over both the input string and the match pattern.

Upvotes: 1

Related Questions