safex
safex

Reputation: 2514

R regex delete all numerics except the ones that match this regex

I try to delete all numerics ([0-9]) from a character array in R, except if there is a dot before the numbers and up to one number before the dot. I.e. I want to delete all numerics which do not match the following regular expression [0-9]{0,1}\.[0-9]{1,20}, i.e. if I have

test = 'ab 300 0.091% bab 200 x'

using [0-9]{0,1}\.[0-9]{1,20}, I want:

'ab 0.091% bab x'

but I fail to see how I can tell gsub to drop all numerics but a given regex. Obviously:

gsub("[0-9]", "",test)
[1] "ab  .% bab  x"

but now the middle part is gone, which I wanted to keep.

Upvotes: 1

Views: 198

Answers (3)

The fourth bird
The fourth bird

Reputation: 163352

If you want to delete all the numbers that do not match your pattern, you could use a capturing group to match what you want to keep and match what you want to remove.

In the replacement use group 1.

([0-9]{0,1}\.[0-9]{1,20})|[0-9]+\s*

Note that [0-9]{0,1} and also be written as [0-9]?

Explanation

  • ([0-9]{0,1}\.[0-9]{1,20}) Capture group 1, match what you want to keep
  • | Or
  • [0-9]+\s* Match what you want to remove, 1+ digits followed by 0+ whitespace chars

Regex demo | R demo

For example

test = 'ab 300 0.091% bab 200 x'
gsub("([0-9]{0,1}\\.[0-9]{1,20})|[0-9]+\\s*", "\\1", test)

Output

[1] "ab 0.091% bab x"

Upvotes: 1

d.b
d.b

Reputation: 32548

gsub(pattern = "(?<!\\d|\\.)\\d+(?!\\.)\\s?",
     replacement = "",
     x = test,
     perl = TRUE)
#[1] "ab 0.091% bab x"

Demo

Upvotes: 0

NGeorgescu
NGeorgescu

Reputation: 27

Could you add \b tags like \b\d+?\b to eliminate all numerics that don't have a decimal or other characters (or potentially \b[\d\.]+\b)?

Upvotes: 0

Related Questions