Reputation: 109874
How can I use R's regex to eliminate space(s) before period(s) unless period is followed by a digit?
Here's what I have and what I've tried:
x <- c("I have .32 dollars AKA 32 cents . ",
"I have .32 dollars AKA 32 cents . Hello World .")
gsub("(\\s+)(?=\\.+)", "", x, perl=TRUE)
gsub("(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)
This gives (no space before .32
):
## [1] "I have.32 dollars AKA 32 cents. "
## [2] "I have.32 dollars AKA 32 cents. Hello World."
I'd like to get:
## [1] "I have .32 dollars AKA 32 cents. "
## [2] "I have .32 dollars AKA 32 cents. Hello World."
I'm saddled with gsub
here but other solutions welcomed to make the question more usable to future searchers.
Upvotes: 4
Views: 520
Reputation: 70732
You don't need a complex expression, you can use a Positive Lookahead here.
> gsub(' +(?=\\.(?:\\D|$))', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "
## [2] "I have .32 dollars AKA 32 cents. Hello World."
Explanation:
+ # ' ' (1 or more times)
(?= # look ahead to see if there is:
\. # '.'
(?: # group, but do not capture:
\D # non-digits (all but 0-9)
| # OR
$ # before an optional \n, and the end of the string
) # end of grouping
) # end of look-ahead
Note: If these space characters could be any type of whitespace just replace ' '+
with \s+
If you are content with using the (*SKIP)(*F)
backtracking verbs, here is the correct representation:
> gsub(' \\.\\d(*SKIP)(*F)| +(?=\\.)', '', x, perl=T)
## [1] "I have .32 dollars AKA 32 cents. "
## [2] "I have .32 dollars AKA 32 cents. Hello World."
Upvotes: 4
Reputation: 5271
Well, I don't know r, but I know regular expressions. Hopefully this answer works in r.
gsub("\\s+\\.(?!\\d)", ".", x, perl=TRUE)
It uses a negative lookahead to ensure that the space(s) and period are not followed by a digit; then it replaces the match with just a period.
Upvotes: 3
Reputation: 179448
Try this regex:
x <- c("I have .32 dollars AKA 32 cents . ",
"I have .32 dollars AKA 32 cents . Hello World .",
"I have .32 dollars AKA 32 cents . Hello World .xyz")
gsub(" *\\.($|\\D)", "\\.\\1", x)
[1] "I have .32 dollars AKA 32 cents. "
[2] "I have .32 dollars AKA 32 cents. Hello World."
[3] "I have .32 dollars AKA 32 cents. Hello World.xyz"
What it does:
" *\\."
searches for a any number of spaces followed by a period."($|\\D)"
searches for either:
$
), \\D
)Upvotes: 2
Reputation: 887291
This seems to work for the example.
gsub("\\s(?=\\.[0-9])(*SKIP)(*F)|(\\s+)(?=\\.+)(?<=[^\\d])", "", x, perl=TRUE)
#[1] "I have .32 dollars AKA 32 cents. "
#[2] "I have .32 dollars AKA 32 cents. Hello World."
Upvotes: 2