Reputation: 195
POSIX Expression is giving me a headache.
Lets say we have a string:
a = "[question(37), question_pipe(\"Person10\")]"
and ultimately I would like to be able to have:
b = c("37", "Person10")
I've had a look at the stringr
package but cant figure out how to extract the information out using regular expressions and str_split
.
Any help would be greatly appreciated.
Cameron
Upvotes: 3
Views: 2366
Reputation: 2854
expanding on flodel's answer - this would be the most concise solution, i think:
a <- "[question(37), question_pipe(\"Person10\")]"
b1 <- unlist(str_extract_all(string = a, pattern = "\(.*?\)"))
b <- gsub("[[:punct:]]", "", b1)
Upvotes: 0
Reputation: 2644
I'd do it this way:
a <- "[question(37), question_pipe(\"Person10\")]"
b <- unlist(strsplit(gsub("\"","",gsub(".*question\\((.*)\\).*question_pipe\\((.*)\\).*","\\1,\\2",a)),","))
print(b)
[1] "37" "Person10"
Upvotes: 0
Reputation: 40821
This should work in you specific case:
a <- "[question(37), question_pipe(\"Person10\")]"
# First split into two parts
b <- strsplit(a, ",")[[1]]
# Extract the number (skip as.integer if you want it as character)
x <- as.integer(gsub("[^0-9]","", b[[1]])) # 37
# Extract the stuff in quotes
y <- gsub(".*\"(.*)\".*", "\\1", b[[2]]) # "Person10"
An alternative for extracting everything in parentheses from the first part:
x <- gsub(".*\\((.*)\\).*", "\\1", b[[1]]) # "37"
Upvotes: 3
Reputation: 89067
So if I understand correctly you want to extract the elements within parenthesis.
You can first extract those elements, including the parenthesis, using str_extract_all
:
b1 <- str_extract_all(string = a, pattern = "\\(.*?\\)")
b1
# [[1]]
# [1] "(37)" "(\"Person10\")"
Since str_extract_all
returns a list, let's turn it into a vector:
b2 <- unlist(b1)
b2
# [1] "(37)" "(\"Person10\")"
Last, you can remove the parenthesis (the first and last character of each string) using str_sub
:
b3 <- str_sub(string = b2, start = 2L, end = -2L)
b3
# [1] "37" "\"Person10\""
Edit: A few comments about the regex pattern: \\(
and \\)
are your opening and closing parenthesis. .*?
means any character string but without being greedy, otherwise you would get one long match from the first (
to the last )
.
Upvotes: 3