Reputation: 21432
I have strings with amounts of dollar. The dollar character $
though is inconsistently placed either before or after the amount and/or either with or without whitespace in-between:
x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")
I'd like to standardize the placement of $
such that it consistently occurs immediately before the amount. To this end I've been using sub
and backreference in this way:
sub("([^$\\s]*)(\\$)", "\\2\\1", x) # or: sub("(.*)(\\$)", "\\2\\1", x)
[1] "$749" "$12" "$555 " "$ 1.50" "$ 66,198"
The result is only partially successful as there is still a whitespace char in-between in the last two strings. I can get rid of these by using nested sub
:
sub("\\s", "", sub("([^$]*)(\\$)", "\\2\\1", x))
[1] "$749" "$12" "$555" "$1.50" "$66,198"
Now the result is the one desired. But isn't there a more direct regex, one that does not require the detour via nested sub
?
Upvotes: 0
Views: 57
Reputation: 21432
Here's one more answer, based on @Sindri's answer but slightly more compact:
sub(".*?([0-9.,]+).*", "$\\1", x)
[1] "$749" "$12" "$555" "$1.50" "$66,198"
Upvotes: 0
Reputation: 627044
Assuming there can only be one $
char ans only max. two non-whitespace chunks in these currency only strings, you can use a single gsub
coupled with trimws
:
x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")
trimws(gsub("(.*)(\\$)|\\s+", "\\2\\1", x))
## => [1] "$749" "$12" "$555" "$1.50" "$66,198"
See an R demo and the regex demo.
Regex details
(.*)
- Group 1: any zero or more chars, as many as possible(\$)
- Group 2: a $
char|
- or\s+
- 1+ whitespaces.The $
is swapped with the text before it and then any whitespaces after $
are removed. Due to the nature of these strings you have, this will work, only trailing whitespaces will need trimming, hence the use of trimws
.
Upvotes: 1
Reputation: 33508
Here is a simple solution:
paste0("$", gsub(" |\\$", "", x))
# [1] "$749" "$12" "$555" "$1.50" "$66,198"
If you want to rely solely on regex
sub("(.*?)(\\$)", "\\2\\1", gsub("\\s", "", x))
# or
gsub(".*?(\\d+[.,]*\\d*).*", "$\\1", x)
Upvotes: 1
Reputation: 1123
Here is a solution using only sub:
x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")
x <- sub(pattern = ' ', replacement = '', x = x, fixed = TRUE)
x <- sub(pattern = '$', replacement = '', x = x, fixed = TRUE)
x <- paste0('$',x)
Upvotes: 1
Reputation: 11584
Does this work using readr package:
x <- c("$749", "12$", "555 $", "$ 1.50", "$ 66,198")
paste0('$',parse_number(x))
[1] "$749" "$12" "$555" "$1.5" "$66198"
Upvotes: 3