Reputation: 1471
I have a string strEx <- "list(A, B, C, D)"
that I would like to store as a character vector:
[1] "A" "B" "C" "D"
I'm not very good at regex (might be overkill as well, but I will need more of it in the future) which is probably part of my problem. I have a solution that I feel is too much code/bad form.
It gives me what I want in the end but I still need to split it on commas and flatten it. I just feel this is a too crude a way to go about it. Anyone have a prettier solution?
d <- gsub(".*\\((.*)\\).*", "\\1", strEx)
d1 <- unlist(tstrsplit(d, ", ", type.convert = TRUE, fixed = TRUE))
Upvotes: 0
Views: 94
Reputation: 132576
You could parse the expression like this:
#parse the expression
pEx <- parse(text = strEx)[[1]]
Expressions are actually lists of symbols and can be treated as such. Here we turn everything except list
into characters:
vapply(pEx[-1], as.character, FUN.VALUE = "")
#[1] "A" "B" "C" "D"
However, if you need to parse a string (which is what you propose as a solution using regex too), some preceding step should usually be improved. You should not have an expression that needs to be parsed.
See this:
library(fortunes)
fortune(106)
#If the answer is parse() you should usually rethink the question.
# -- Thomas Lumley
# R-help (February 2005)
Upvotes: 3
Reputation: 24074
You can try with eval(parse(...))
, adding quotes to every letters :
unlist(eval(parse(text=gsub("([A-Z])", "\"\\1\"", "list(A, B, C, D)"))))
#[1] "A" "B" "C" "D"
If you haven't commas in the first string, you can add commas and remove the last one with another sub
step:
unlist(eval(parse(text=sub(",(?=[)])", "", gsub("([A-Z])", "\"\\1\",", "list(A B C D)"), perl=TRUE))))
# [1] "A" "B" "C" "D"
Upvotes: 2
Reputation: 626689
Your 2-step approach is very good and readable. If you want to try and grab items inside a block of text in one go, you can use a PCRE regex with \G
and \K
operators using the base R functions:
> g <- unlist(regmatches(strEx, gregexpr("(?:list\\(\\s*|(?!^)\\G(?:,\\s*)?)\\K[^,)]+", strEx, perl=TRUE)))
> g
[1] "A" "B" "C" "D"
Pattern details:
(?:list\\(\\s*|(?!^)\\G(?:,\\s*)?)
- the list(
and 0+ whitespaces substring (see list\\(\\s*
part) or the end of the previous successful match (see (?!^)\\G
) and an optional sequence of ,
and zero or more whitespaces (see (?:,\\s*)?
)\\K
- omit the text matched so far[^,)]+
- 1 or more chars other than ,
and )
.See the regex demo online.
Upvotes: 1