Reputation: 45
I am a new comer to regular expressions, so please bear with me.
I have a string like this:
txt1 <- 'a,b,a.b,a.,1,2,1.2,1.,.,11,222,11.222,11.'
Imagine it is from a .csv and each cell is separated by ','. Now I would like to remove all '.' except those marking decimal points. In this end, I'd like to end up with something like this:
txt2 <- 'a,b,ab,a,1,2,1.2,1,,11,222,11.222,11'
I have tried the following codes:
txt2 <- gsub(pattern = '[^a-z0-9,(\\d\\.\\d)]', replacement = '', text = txt1)
txt2 <- gsub(pattern = '[^a-z0-9,|(\\d\\.\\d)]', replacement = '', text = txt1)
But neither works, both returning
> print(txt2)
[1] "a,b,a.b,a.,1,2,1.2,1.,.,11,222,11.222,11."
Any idea how I might correct my codes? Thanks!
Upvotes: 3
Views: 946
Reputation: 1647
The key is to use the negative lookbehind ?<!
and negative lookahead ?!
> txt1 <- 'a,b,a.b,a.,1,2,1.2,1.,.,11,222,11.222,11.'
> txt2 <- gsub(pattern='((?<![0-9])\\.)|(\\.(?![0-9]))', replacement='', x=txt1, perl=TRUE)
> txt2
[1] "a,b,ab,a,1,2,1.2,1,,11,222,11.222,11"
This pattern matchs a period \\.
that is proceeded by a character that is not 0-9
or a period that is followed by a character that is not 0-9
. You have to set perl=TRUE
for R to recognize the lookbehind and lookahead.
This will trim leading period characters, so '.2' will become '2'. If this is not wanted, the lookbehind needs to be (?<![0-9,])
.
Upvotes: 4
Reputation: 50668
Negative lookahead (as suggested by @CAustin) seems to be the most elegant and concise.
Since none of the above solutions give you the actual R code, here it is:
txt2 <- gsub("\\.(?!\\d)", "", txt1, perl = TRUE)
[1] "a,b,ab,a,1,2,1.2,1,,11,222,11.222,11"
Upvotes: 0
Reputation: 4614
You can use negative lookahead. Match \.(?!\d)
and replace it with nothing.
https://regex101.com/r/LNHYOY/1
Upvotes: 0