H_A
H_A

Reputation: 677

Extracting the numbers from the data frame

I have a data frame with a "Calculation" column, which could be reproduced by the following code:

a <- data.frame(Id = c(1:3), Calculation = c('[489]/100','[4771]+[4777]+[5127]+[5357]+[5597]+[1044])/[463]','[1044]/[463]'))

> str(a)
'data.frame':   3 obs. of  2 variables:
$ Id         : int  1 2 3
$ Calculation: Factor w/ 3 levels "[1044]/[463]",..: 3 2 1

Please note that there are two types of numbers in "Calculation" column: most of them are surrounded by brackets, but some (in this case the number 100) is not (this has a meaning in my application).

What I would like to do is to extract all the distinct numbers that appear in Calculation column to return a vector with the union of these numbers. Ideally, I would like to be able to distinguish between the numbers that are between brackets and the numbers that are not. This step is not so important (if it makes it complicated) since the numbers that are NOT between the brackets are few and I can manually detect them. So the desired output in this case would be:

b = c(489,4771,4777,5127,5357,5597,1044,463)

Thanks in advance

Upvotes: 1

Views: 74

Answers (1)

akrun
akrun

Reputation: 886938

We can use str_extract_all from library(stringr). Using the regex lookbehind ((?<=\\[)), we match the numbers \\d+ that is preceded by [, extract them in a list, unlist to convert it to vector and then change the character to numeric (as.numeric), and get the unique elements.

library(stringr)
unique(as.numeric(unlist(str_extract_all(a$Calculation, '(?<=\\[)\\d+'))))
#[1]  489 4771 4777 5127 5357 5597 1044  463

Upvotes: 1

Related Questions