Brian Kirkham
Brian Kirkham

Reputation: 15

How to remove all numbers and commas from a string except any number immediately preceded by $ using R?

I would like to remove all numbers and commas from a string except any number that is immediately preceded by $ and immediately followed by a comma.

For example, I have:

str = "1, $100-$1,000 2, $1001-$10,000 3, $10,001-$100,000"

I would like to obtain the following:

"$100-$1,000  $1001-$10,000  $10,001-$100,000"

I have tried to use gsub with a negative lookbehind

new_str = gsub("(?<!\\$)[0-9]*,", "", str)

However, this gives the following error message:

Error in gsub("(?<!\\$)[0-9]*,", "", str) : invalid regular expression '(<!\$)[0-9]*,', reason 'Invalid regexp'

It seems that the negative lookbehind is incorrectly coded, but I can't seem to figure out why. Any help is much appreciated!

Upvotes: 0

Views: 246

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269854

1) This gives the desired answer in the case of the sample string:

gsub("\\d+, ", "", str)
## [1] "$100-$1,000 $1001-$10,000 $10,001-$100,000"

Visualization of regular expression

\d+, 

Regular expression visualization

Debuggex Demo

2) Here is a second approach:

library(gsubfn)

paste(strapplyc(str, "(\\$\\S+)", simplify = c), collapse = " ")
## [1] "$100-$1,000 $1001-$10,000 $10,001-$100,000"

Visualization of regular expression

(\$\S+)

Regular expression visualization

Debuggex Demo

Upvotes: 1

alpha bravo
alpha bravo

Reputation: 7948

you could use this pattern

(\$[0-9,-]+)|\d+,\s 

and replace w/ \1
Demo

Upvotes: 0

Related Questions