user2246905
user2246905

Reputation: 1039

Remove all dots but first in a string using R

I have some errors in some numbers showing numbers like "59.34343.23". I know the first dot is correct but the second one (or any after the first) should be remove. How can I remove those?

I tried using gsub in R:

gsub("(?<=\\..*)\\.", "", "59.34343.23", perl=T)

or

gsub("(?<!^[^.]*)\\.", "", "59.34343.23", perl=T)

However it gets the following error "invalid regular expression". But I have been trying the same code in a regex tester and it works. What is my mistake here?

Upvotes: 2

Views: 1038

Answers (5)

Cary Swoveland
Cary Swoveland

Reputation: 110675

By specifying perl = TRUE you can convert matches of the following regular expression to empty strings:

^[^.]*\.\K|(?!^)\G[^.]*\K\.

Demo

If you are unfamiliar with \K or \G hover over it in the regular expression at the link to see an explanation of its effect.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163277

The pattern that you tried does not match, because there is an infinite quantifier in the lookbehind (?<=\\..*) that is not supported.

Another variation using \G to get continuous matches after the first dot:

(?:^[^.]*\.|\G(?!^))[^.]*\K\.

In parts, the pattern matches:

  • (?: Non capture group for the alternation |
    • ^[^.]*\. Start of string, match any char except ., then match .
    • | Or
    • \G(?!^) Assert the position at the end of the previous match (not at the start)
  • )[^.]* Optionally match any char except .
  • \K\. Clear the match buffer an match the dot (to be removed)

Regex demo | R demo

gsub("(?:^[^.]*\\.|\\G(?!^))[^.]*\\K\\.", "", "59.34343.23", perl=T)

Output

[1] "59.3434323"

Upvotes: 1

sln
sln

Reputation: 2711

There is always the option to only write back the dot if its the first in the line.
Key feature is to consume the other dots but don't write it back.
Effect is to delete trailing dots.

Below uses a branch reset to accomplish the goal (Perl mode).

(?m)(?|(^[^.\n]*\.)|()\.+)

Replace $1

https://regex101.com/r/cHcu4j/1

 (?m)
 (?|
    ( ^ [^.\n]* \. )              # (1)
  | ( )                           # (1)
    \.+ 
 )

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You can use

gsub("^([^.]*\\.)|\\.", "\\1", "59.34343.23")
gsub("^([^.]*\\.)|\\.", "\\1", "59.34343.23", perl=TRUE)

See the R demo online and the regex demo.

Details:

  • ^([^.]*\.) - Capturing group 1 (referred to as \1 from the replacement pattern): any zero or more chars from the start of string and then a . char (the first in the string)
  • | - or
  • \. - any other dot in the string.

Since the replacement, \1, refers to Group 1, and Group 1 only contains a value after the text before and including the first dot is matched, the replacement is either this part of text, or empty string (i.e. the second and all subsequent occurrences of dots are removed).

Upvotes: 5

akrun
akrun

Reputation: 887078

We may use

gsub("^[^.]+\\.(*SKIP)(*FAIL)|\\.", "", str1, perl = TRUE)
[1] "59.3434323"

data

str1 <-  "59.34343.23"

Upvotes: 3

Related Questions