Reputation: 2514
I try to delete all numerics ([0-9]
) from a character array in R, except if there is a dot before the numbers and up to one number before the dot. I.e. I want to delete all numerics which do not match the following regular expression [0-9]{0,1}\.[0-9]{1,20}
, i.e. if I have
test = 'ab 300 0.091% bab 200 x'
using [0-9]{0,1}\.[0-9]{1,20}
, I want:
'ab 0.091% bab x'
but I fail to see how I can tell gsub
to drop all numerics but a given regex. Obviously:
gsub("[0-9]", "",test)
[1] "ab .% bab x"
but now the middle part is gone, which I wanted to keep.
Upvotes: 1
Views: 198
Reputation: 163352
If you want to delete all the numbers that do not match your pattern, you could use a capturing group to match what you want to keep and match what you want to remove.
In the replacement use group 1.
([0-9]{0,1}\.[0-9]{1,20})|[0-9]+\s*
Note that [0-9]{0,1}
and also be written as [0-9]?
Explanation
([0-9]{0,1}\.[0-9]{1,20})
Capture group 1, match what you want to keep|
Or[0-9]+\s*
Match what you want to remove, 1+ digits followed by 0+ whitespace charsFor example
test = 'ab 300 0.091% bab 200 x'
gsub("([0-9]{0,1}\\.[0-9]{1,20})|[0-9]+\\s*", "\\1", test)
Output
[1] "ab 0.091% bab x"
Upvotes: 1
Reputation: 32548
gsub(pattern = "(?<!\\d|\\.)\\d+(?!\\.)\\s?",
replacement = "",
x = test,
perl = TRUE)
#[1] "ab 0.091% bab x"
Upvotes: 0
Reputation: 27
Could you add \b
tags like \b\d+?\b
to eliminate all numerics that don't have a decimal or other characters (or potentially \b[\d\.]+\b
)?
Upvotes: 0