Reputation: 783
I have a text string as shown below:
txt = "(2) 1G–1G (0)"
And, dataframe:
DF <- data.frame(txt = c('(2) 1G–1G (0)','(1) 1G–1G (4)','(2) 1G–1G (0)'))
I was trying to extract numbers within brackets in a way as shown below:
I want extracted result to be in this format:
2 - 0
What I am using is this:
gsub('.+\\(([0-9]+)\\) 1G–1G \\(([0-9]+)\\).*$', '\\1 \\2', txt)
But What I am getting from above is:
"(2) 1G–1G (0)"
I am not sure where is mistake. Can someone please explain why this code is not working the way I wanted it to work?
Upvotes: 2
Views: 92
Reputation: 79188
Do not understand why you would say it does not work:
sub(".*\\((\\d+).*\\((\\d+).*","\\1-\\2",DF$txt)
[1] "2-0" "1-4" "2-0"
or even:
transform(DF,extracted=sub(".*\\((\\d+).*\\((\\d+).*","\\1 - \\2",txt))
txt extracted
1 (2) 1G–1G (0) 2 - 0
2 (1) 1G–1G (4) 1 - 4
3 (2) 1G–1G (0) 2 - 0
Upvotes: 1
Reputation: 43169
You could extract them using base R
with regexec
and regmatches
like so:
(df <- data.frame(txt = c('(2) 1G–1G (0)','(1) 1G–1G (4)','(2) 1G–1G (0)', 'somejunkhere')))
getNumbers <- function(col) {
(result <- sapply(col, function(x) {
m <- regexec("\\((\\d+)\\)[^()]*\\((\\d+)\\)", x, perl = TRUE)
groups <- regmatches(x, m)
(out <- ifelse(identical(groups[[1]], character(0)),
NA,
sprintf("%s - %s", groups[[1]][2], groups[[1]][3])))
}))
}
df$extracted <- getNumbers(df$txt)
df
This yields
txt extracted
1 (2) 1G–1G (0) 2 - 0
2 (1) 1G–1G (4) 1 - 4
3 (2) 1G–1G (0) 2 - 0
4 somejunkhere <NA>
Upvotes: 1
Reputation: 626689
You may use
DF$txt <- trimws(gsub("[^()–]*\\(([0-9]+)\\)[^()–]*"," \\1 ",DF$txt))
## => [1] "2 – 0" "1 – 4" "2 – 0"
See the regex demo and the R demo online.
Details
[^()–]*
- any 0+ chars other than (
, )
and -
\\(
- a (
([0-9]+)
- Group 1: one or more digits\\)
- a )
char[^()–]*
- any 0+ chars other than (
, )
and -
Upvotes: 1