Chris Ruehlemann
Chris Ruehlemann

Reputation: 21432

How to standardize the placement of the $ char in strings

I have strings with amounts of dollar. The dollar character $ though is inconsistently placed either before or after the amount and/or either with or without whitespace in-between:

x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")

I'd like to standardize the placement of $ such that it consistently occurs immediately before the amount. To this end I've been using sub and backreference in this way:

sub("([^$\\s]*)(\\$)", "\\2\\1", x) # or: sub("(.*)(\\$)", "\\2\\1", x)
[1] "$749"     "$12"      "$555 "    "$ 1.50"   "$ 66,198"

The result is only partially successful as there is still a whitespace char in-between in the last two strings. I can get rid of these by using nested sub:

sub("\\s", "",  sub("([^$]*)(\\$)", "\\2\\1", x))
[1] "$749"    "$12"     "$555"    "$1.50"   "$66,198"

Now the result is the one desired. But isn't there a more direct regex, one that does not require the detour via nested sub?

Upvotes: 0

Views: 57

Answers (5)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21432

Here's one more answer, based on @Sindri's answer but slightly more compact:

sub(".*?([0-9.,]+).*", "$\\1", x) 
[1] "$749"    "$12"     "$555"    "$1.50"   "$66,198"

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627044

Assuming there can only be one $ char ans only max. two non-whitespace chunks in these currency only strings, you can use a single gsub coupled with trimws:

x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")
trimws(gsub("(.*)(\\$)|\\s+", "\\2\\1", x))
## => [1] "$749"    "$12"     "$555"    "$1.50"   "$66,198"

See an R demo and the regex demo.

Regex details

  • (.*) - Group 1: any zero or more chars, as many as possible
  • (\$) - Group 2: a $ char
  • | - or
  • \s+ - 1+ whitespaces.

The $ is swapped with the text before it and then any whitespaces after $ are removed. Due to the nature of these strings you have, this will work, only trailing whitespaces will need trimming, hence the use of trimws.

Upvotes: 1

s_baldur
s_baldur

Reputation: 33508

Here is a simple solution:

paste0("$", gsub(" |\\$", "", x))
# [1] "$749"    "$12"     "$555"    "$1.50"   "$66,198"

If you want to rely solely on regex

sub("(.*?)(\\$)", "\\2\\1", gsub("\\s", "", x))
# or 
gsub(".*?(\\d+[.,]*\\d*).*", "$\\1", x)

Upvotes: 1

Santiago I. Hurtado
Santiago I. Hurtado

Reputation: 1123

Here is a solution using only sub:

x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")

x <- sub(pattern = ' ', replacement = '', x = x, fixed = TRUE)
x <- sub(pattern = '$', replacement = '', x = x, fixed = TRUE)
x <- paste0('$',x)

Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Does this work using readr package:

x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")
paste0('$',parse_number(x))


[1] "$749"   "$12"    "$555"   "$1.5"   "$66198"

Upvotes: 3

Related Questions