Reputation: 677
I have a data frame with a "Calculation" column, which could be reproduced by the following code:
a <- data.frame(Id = c(1:3), Calculation = c('[489]/100','[4771]+[4777]+[5127]+[5357]+[5597]+[1044])/[463]','[1044]/[463]'))
> str(a)
'data.frame': 3 obs. of 2 variables:
$ Id : int 1 2 3
$ Calculation: Factor w/ 3 levels "[1044]/[463]",..: 3 2 1
Please note that there are two types of numbers in "Calculation" column: most of them are surrounded by brackets, but some (in this case the number 100) is not (this has a meaning in my application).
What I would like to do is to extract all the distinct numbers that appear in Calculation column to return a vector with the union of these numbers. Ideally, I would like to be able to distinguish between the numbers that are between brackets and the numbers that are not. This step is not so important (if it makes it complicated) since the numbers that are NOT between the brackets are few and I can manually detect them. So the desired output in this case would be:
b = c(489,4771,4777,5127,5357,5597,1044,463)
Thanks in advance
Upvotes: 1
Views: 74
Reputation: 886938
We can use str_extract_all
from library(stringr)
. Using the regex lookbehind ((?<=\\[)
), we match the numbers \\d+
that is preceded by [
, extract them in a list
, unlist
to convert it to vector
and then change the character
to numeric
(as.numeric
), and get the unique
elements.
library(stringr)
unique(as.numeric(unlist(str_extract_all(a$Calculation, '(?<=\\[)\\d+'))))
#[1] 489 4771 4777 5127 5357 5597 1044 463
Upvotes: 1