ErrantBard
ErrantBard

Reputation: 1471

R - string manipulation and extraction

I have a string strEx <- "list(A, B, C, D)" that I would like to store as a character vector:

[1] "A" "B" "C" "D"

I'm not very good at regex (might be overkill as well, but I will need more of it in the future) which is probably part of my problem. I have a solution that I feel is too much code/bad form.

It gives me what I want in the end but I still need to split it on commas and flatten it. I just feel this is a too crude a way to go about it. Anyone have a prettier solution?

d <- gsub(".*\\((.*)\\).*", "\\1", strEx)
d1 <- unlist(tstrsplit(d, ", ", type.convert = TRUE, fixed = TRUE))

Upvotes: 0

Views: 94

Answers (3)

Roland
Roland

Reputation: 132576

You could parse the expression like this:

#parse the expression
pEx <- parse(text = strEx)[[1]] 

Expressions are actually lists of symbols and can be treated as such. Here we turn everything except list into characters:

vapply(pEx[-1], as.character, FUN.VALUE = "")
#[1] "A" "B" "C" "D"

However, if you need to parse a string (which is what you propose as a solution using regex too), some preceding step should usually be improved. You should not have an expression that needs to be parsed.

See this:

library(fortunes)
fortune(106)
#If the answer is parse() you should usually rethink the question.
#   -- Thomas Lumley
#      R-help (February 2005)

Upvotes: 3

Cath
Cath

Reputation: 24074

You can try with eval(parse(...)), adding quotes to every letters :

unlist(eval(parse(text=gsub("([A-Z])", "\"\\1\"", "list(A, B, C, D)"))))
#[1] "A" "B" "C" "D"

If you haven't commas in the first string, you can add commas and remove the last one with another sub step:

unlist(eval(parse(text=sub(",(?=[)])", "", gsub("([A-Z])", "\"\\1\",", "list(A B C D)"), perl=TRUE))))
# [1] "A" "B" "C" "D"

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

Your 2-step approach is very good and readable. If you want to try and grab items inside a block of text in one go, you can use a PCRE regex with \G and \K operators using the base R functions:

> g <- unlist(regmatches(strEx, gregexpr("(?:list\\(\\s*|(?!^)\\G(?:,\\s*)?)\\K[^,)]+", strEx, perl=TRUE)))
> g
[1] "A" "B" "C" "D"

Pattern details:

  • (?:list\\(\\s*|(?!^)\\G(?:,\\s*)?) - the list( and 0+ whitespaces substring (see list\\(\\s* part) or the end of the previous successful match (see (?!^)\\G) and an optional sequence of , and zero or more whitespaces (see (?:,\\s*)?)
  • \\K - omit the text matched so far
  • [^,)]+ - 1 or more chars other than , and ).

See the regex demo online.

Upvotes: 1

Related Questions