N Kaushik
N Kaushik

Reputation: 2208

Android/Java Regex to remove extra zeros from sub-strings

I have the following string as input :

"2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101"

Final output will be like :

"2,3,-4,0,0,.03,2.01,.001,-.03,101"

i.e. all leading and trailing zeros will be removed and both positive/negative zeros will be simply zero.

We can achieve this by split the string first and using Regex for each part. But my string size is more than 10000.
How can we achieve this using Regex?

Edit:

Analysis of Answers:

I have tested all answers with String "0.00,-0.00,00.00,-00.00,40.00,-40.00,4.0,-4.0,4.01,-4.01,04.01,-04.01,004.04,-004.04,0004.040,-0004.040,101,.40,-.40,0.40,-0.40" and answer from Wiktor Stribiżew passed all the test cases .(see here : https://regex101.com/r/tS8hE3/9 ) Other answers were passed on most of the cases but not all.

Upvotes: 12

Views: 745

Answers (7)

bobble bubble
bobble bubble

Reputation: 18490

UPDATE to cover more cases such as 01.,.100, 01.10

(?<=,|^)(?:[0.+-]+(?=0(?:,|\.\B|$))|0+(?=[1-9]))|\.0+\b|\b0+(?=\d*\.\b)|\.\B|(?<=[1-9])0+(?=,|$)

This pattern requires more backtracking, thus can get slower on large input. Java String:

"(?<=,|^)(?:[0.+-]+(?=0(?:,|\\.\\B|$))|0+(?=[1-9]))|\\.0+\\b|\\b0+(?=\\d*\\.\\b)|\\.\\B|(?<=[1-9])0+(?=,|$)"

In addition to the previous pattern this one matches

  • (?<=,|^)(?:...|0+(?=[1-9])) add leading zeros preceding [1-9]
  • \.0+\b modified to match period with zeros only before a word boundary
  • \b0+(?=\d*\.\b) match zeros at boundary if period preceded by optional digits ahead
  • \.\B matches a period bordering to a non word boundary (eg .,)
  • (?<=[1-9])0+(?=,|$) matches trailing zeros following [1-9]

Demo at regex101 or Regexplanet (click Java)


Answer before update
You can also try replaceAll this regex with empty.

(?<=,|^)[0.+-]+(?=0(?:,|$))|\.0+\b|\b0+(?=\.)
  • (?<=,|^)[0.+-]+(?=0(?:,|$)) matches all parts that consist only of [0.+-] with at least a trailing zero. Limited by use of lookaround assertions: (?<=,|^) and (?=0(?:,|$))

  • |\.0+\b or match a period followed by one or more zeros and a word boundary.

  • |\b0+(?=\.) or match a boundary followed by one or more zeros if a period is ahead.

Unquestioned cases like 0.,01,1.10 are not covered by this pattern yet. As a Java String:

"(?<=,|^)[0.+-]+(?=0(?:,|$))|\\.0+\\b|\\b0+(?=\\.)"

Demo at regex101 or Regexplanet (click Java)

Upvotes: 2

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626845

Updated test case answer

Use the following regex:

String rx = "-?0+\\.(0)+\\b|\\.0+\\b|\\b0+(?=\\.\\d*[1-9])|\\b0+(?=[1-9]\\d*\\.)|(\\.\\d*?)0+\\b";

And replace with $1$2. See another demo.

The regex matches several alternatives and captures some parts of the string to later re-insert during replacement:

  • -?0+\.(0)+\b - matching an optional - followed with one or more 0s followed with a . and then captures exactly one 0 but matching one or more occurrences (because the (...) is placed on the 0 and the + is applied to this group); the word boundary at the end requires a non-word character to appear after the last matched 0. In the replacement, we restore the 0 with $1 backreference. So, -00.00 or 00.00 will be replaced with 0.
  • | - or...
  • \.0+\b - a dot followed with one or more zeros before a , (since the string is comma-delimited).
  • | - or...
  • \b0+(?=\.\d*[1-9]) - a word boundary (start of string or a location after ,) followed with one or more 0s that are followed by . + zero or more digits followed by a non-0 digit (so we remove leading zeros in the integer part that only consists of zeros)
  • | - or...
  • \b0+(?=[1-9]\d*\.) - a word boundary followed by one or more zeros followed by a non-0 digit before a . (so, we remove all leading zeros from the integer part that is not equal to 0).
  • | - or...
  • (\.\d*?)0+\b - capturing a .+zero or more digits, but as few as possible, up to the first 0, and then just matching one or more zeros (up to the end of string or ,) (so, we get rid of trailing zeros in the decimal part)

Answer before the test cases update

I suggest a very simple and short regex that does what you need:

-0+\.(0)+\b|\.0+\b|\b0+(?=\.\d*[1-9])

Replace with $1.

See the regex demo. Short IDEONE demo:

String re = "-0+\\.(0)+\\b|\\.0+\\b|\\b0+(?=\\.\\d*[1-9])"; 
String str = "2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,0.001,-0.03";
String expected = "2,3,-4,0,0,.03,2.01,.001,-.03,101,.001,-.03"; 
System.out.println(str.replaceAll(re, "$1").equals(expected)); // TRUE

Explanation:

  • -0+\.(0)+\b - a minus followed with one or more 0s (0+) followed with a literal dot (\.) followed with one or more zeros (and capturing just the last 0 matched with (0)+) followed with a word boundary (location before , in this context)
  • | - or...
  • \.0+\b - a literal dot (\.) followed with one or more zeros followed with a word boundary (location before , in this context)
  • | - or...
  • \b0+(?=\.\d*[1-9]) - a word boundary (location after , in this context) followed with one or more zeros that must be followed with a literal dot (\.), then zero or more digits and then a digit from 1 to 9 range (so that the decimal part is more than 0).

Upvotes: 3

Ryan G
Ryan G

Reputation: 71

/(?!-)(?!0)[1-9][0-9]*\.?[0-9]*[1-9](?!0)|(?!-)(?!0)\.?[0-9]*[1-9](?!0)/g

Upvotes: 0

roblovelock
roblovelock

Reputation: 1981

Using the list of numbers from your question, and some additional ones, the following regex replace will remove all leading and trailing zeros.

numbers.replaceAll("\\b0*([1-9]*[0-9]+)(\\.[0-9]*[1-9])?\\.?0*\\b", "$1$2");

with input:

2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,101.1010,0020.00

the result is:

2,3,-4,0,-0,0.03,2.01,0.001,-0.03,101,101.101,20

If you want to have decimals without the leading 0 then you can use the following.

numbers.replaceAll("\\b0*([0-9]+)(\\.[0-9]*[1-9])?\\.?0+\\b|0+(\\.[0-9]+?)0*\\b", "$1$2$3");

with input:

2.0,3.00,-4.0,0.00,-0.00,0.03,2.01,0.001,-0.03,101,101.1010,0020.00

the result is:

2,3,-4,0,-0,.03,2.01,.001,-.03,101,101.101,20

Upvotes: 1

bmbigbang
bmbigbang

Reputation: 1378

is it possible to just use replace? example:

str.replaceAll("\.0+,|,0+(?=\.)", ",");

demo

Upvotes: 0

vks
vks

Reputation: 67968

\.0+$|^(-)?0+(?=\.)

You can try this.Replace by $1.if u get empty string or - after replacement replace it by 0.See demo.

https://regex101.com/r/cZ0sD2/7

If you want to do on full string use

-?0*\.0+\b|\.0+(?=,|$)|(?:^|(?<=,))(-)?0+(?=\.)

See demo.

https://regex101.com/r/cZ0sD2/16

Upvotes: 3

nAviD
nAviD

Reputation: 3261

You can do it with 2 times replacement :

first use \.0+(?=(,|$)) and replace with ""

then use (?!(^|,))-0(?=(,|$)) and replace it with "0"

Upvotes: 0

Related Questions