Reputation: 23
I have a (sorted) list like this and seek a regular expression to match consecutive duplicates.
1,1,1.28,1.35,1.4,1.4,2,2,4,7.5,7.56
I tried different options and the best so far was ?:^|,)([^,]+)(,[ ]*\1)+
, but obviously it does not take into account cases like 1,1,1.28
(see demo).
In plain words, the regex I would need it:
Whatever there's inside two commas, match if there is a duplicate
Upvotes: 2
Views: 188
Reputation: 204
My take on this is:
(\b\d+(?:\.\d+)?\b)(?:,\1)+[^\.\d]
What's good about this one in particular is that it matches all the commas between the repeating numbers. That is handy in case you have to only retain one copy of a number in the list and delete all the others - you can simply delete the entire match and substitute it back with group 1 content, and the comma order will still be as expected - a,b,c
! Or in case you need to remove duplicates entirely, just remove all matches (again, the order will be the same).
Explanation:
(\b\d+(?:\.\d+)?\b)
matches a number, possibly a decimal fraction. "boundaries" are used in order not to match "...,11,1,...". This exact ordering of numbers is not allowed (11>1), but I inserted it just to make sure there will no problems of similar kind.(?:,\1)+
matches a comma and then the previously found number. Here we use the fact that the numbers are sorted.[^\.\d]
is tricky: in case the first non-mathing number has a dot and the matching doesn't, we have to stop and do not match the dot. Also we have to not match "7.5,7.56", and for that we can use "not digit". But then we have to match everything else, including end of line. So as a substitute for "not digit AND not dot" I used "not (digit or dot)".Upvotes: 1
Reputation: 627101
You can use
(?<![^\D,])(\d+(?:\.\d+)?)(?:,\1)+(?![^,]|\.?\d)
Replace with $1
. See the regex demo.
Details:
(?<![^\D,])
- immediately to the left of the current location, there can be no char other than a non-digit or comma(\d+(?:\.\d+)?)
- Group 1: one or more digits followed with an optional sequence of .
and one or more digits(?:,\1)+
- one or more sequences of a comma and Group 1 values(?![^,]|\.?\d)
- immediately to the right, there can't be a char other than a ,
or an optional .
followed with a digit.Upvotes: 1