Reputation: 29285
Assume we've got the following string.
str <- '<a><b><c>';
I'd need to split it so that the output is a vector of 'a'
, 'b'
, 'c'
.
Essentially I'd probably need a RegEx split function that takes out instances of <(*)>
from the original string and add them in a new vector.
Upvotes: 2
Views: 159
Reputation: 269461
1) strsplit/gsub Remove the <
characters and then split by >
like this. (One might have expected that this would leave a zero character component at the end but fortunately because of the way strsplit
works this does not occur.) This solution is particularly short and uses no packages.
unlist(strsplit(gsub("<", "", str), ">"))
## [1] "a" "b" "c"
2) scan/chartr Replace <
and >
characters with a space and then use scan to read in what is left. This solution uses no packages and is particularly straight-forward but depends on the fields not containing spaces:
scan(textConnection(chartr("<>", " ", str)), what = "", quiet = TRUE)
## [1] "a" "b" "c"
3) strapplyc strapplyc
in the gsubfn package extracts the fields that match a regular expression. (stringr package also provides a similar function and base R provides regmatches
which can also do this too but a bit awkwardly.) This solution is very short but does use a package.
library(gsubfn)
strapplyc(str, "[^<>]+", simplify = c)
[1] "a" "b" "c"
Upvotes: 2
Reputation: 6272
You can split using strsplit
and a regex /[<>]+/
and then filter out all the empty results with lapply
:
str <- '<ab><bc><cd>'
unlist(lapply(strsplit(str,"[<>]+"), function(x){x[!x ==""]}))
//[1] "ab" "bc" "cd"
Or simply remove the first empty arg:
unlist(strsplit(str,"[<>]+"))[-1]
//[1] "ab" "bc" "cd"
Upvotes: 1
Reputation: 886948
We can use str_extract_all
library(stringr)
str_extract_all(str2, '[a-z]+')[[1]]
#[1] "ab" "bc" "cd"
Upvotes: 1
Reputation: 83215
str <- '<a><b><c>'
str <- gsub('<|>','',str)
str <- unlist(strsplit(str,'',fixed=TRUE)) # or: strsplit(str,'',fixed=TRUE)[[1]]
gives:
> str
[1] "a" "b" "c"
In respons to your comment:
str2 <- '<ab><bc><cd>'
str2 <- unlist(strsplit(str2,'><',fixed=TRUE)) # or: strsplit(str2,'><',fixed=TRUE)[[1]]
str2 <- gsub('<|>','',str2)
gives:
> str2
[1] "ab" "bc" "cd"
Upvotes: 4
Reputation: 16277
First, gsub
'><' for something else. I chose a space. This is what you will strsplit
on later. Then, then remove '>' and '<'. You can then strsplit
on space. Use unlist
if needed.
str1 <- '<a><b><c>';
str1 <-gsub('><',' ',str1)
str1 <-gsub('>|<','',str1)
strsplit(str1,' ')
#"a" "b" "c"
Upvotes: 1