Ciarán Tobin
Ciarán Tobin

Reputation: 7526

Regex expression to help count only the zeros in a string

I'm trying to count the number of 0s in a string of numbers. Not exactly just the character 0, but the number zero. e.g. I want to count 0, 0.0, 0.000 etc. The numbers will be separated by spaces, e.g.:

1.0 5.0 1 5.4 12 0.1 14.2675 0.0 0.00005

A simple search for " 0" in the string nearly does the job (I have to first insert a leading space in the string for this to work - in case the first number is a zero). However it doesn't work for numbers in the form 0.x e.g. 0.1, 0.02 etc. I suppose I need to check for 0 and see if there is a decimal point and then non-zero numbers after it, but I have no idea how to do that. Something like:

" 0*|(0\\.(?!\\[1-9\\]))"

Anyone have any ideas how I might accomplish this? Using a regular expression preferably. Or if it it's easier, I'm happy to count the number of non-zero elements. Thank you.

NOTE: I'm using split in Java to do this (split the string using the regular expression and then count with .length()).

Upvotes: 0

Views: 2195

Answers (3)

Alan Moore
Alan Moore

Reputation: 75222

split() is not the solution to this problem, though it can be part of the solution, as Antti's answer demonstrated. You'll find it much easier to match the zero-valued numbers with find() in a loop and count the matches, like this:

String s = "1.0 5.0 1 5.4 12 0.1 14.2675 0.0 0.00005 0. .0 0000 -0.0";

Pattern p = Pattern.compile("(?<!\\S)-?(?:0+(?:\\.?0*)|\\.0+)(?!\\S)");
Matcher m = p.matcher(s);
int n = 0;

while (m.find()) {
    System.out.printf("%n%s ", m.group());
    n++;
}
System.out.printf("%n%n%d zeroes total%n", n);

output:

0.0
0.
.0
0000
-0.0

5 zeroes total

This is how Tim meant for you to use the regex in his answer, too (I think). Breaking down my regex, we have:

  • (?<!\\S) is a negative lookbehind that matches a position that's not preceded by a non-whitespace character. It's equivalent to Tim's positive lookbehind, (?<=^|\s), which explicitly matches the beginning of the string or right after a whitespace character.

  • -?(?:0+(?:\\.?0*)|\\.0+) matches an optional minus sign followed by at least one zero and at most one decimal point.

  • (?!\\S) is equivalent to (?=\s|$) - it matches right before a whitespace character or at the end of the string.

The lookbehind and lookahead ensure that you always match the whole token, just like you would if you were splitting on whitespace. Without those, it would also match zeros that are part of a non-zero tokens like 1230.0456.


EDIT (in response to a comment): My main objection to using split() is that it's needlessly convoluted. You're creating an array of strings comprising all the parts of the string you don't care about, then doing some math on the array's length to get the information you want. Sure it's only one line of code, but it does a very poor job of communicating its intent. Anyone who's not not already familiar with the idiom could have a very difficult time sussing out what it does.

Then there's the trailing empty tokens issue: if you use the split technique on my revised sample string you'll get a count of 4, not 5. That's because the last chunk of the string matches the split regex, meaning the last token should be an empty string. But Java (following Perl's lead) silently drops trailing empty tokens by default. You can override that behavior by passing a negative integer as the second argument, but what if you forget to do that? It's a very easy mistake to make, and potentially a very difficult one to troubleshoot.

As for performance, the two approaches are virtually identical in speed (I don't know about memory they use). It's not likely to be a problem when working with reasonably-sized texts.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

How about this:

(?<=^|\s)[0.]+(?=\s|$)

Explanation:

(?<=^|\s) # Assert position after a space or the start of the string
[0.]+     # Match one or more zeroes/decimal points
(?=\s|$)  # Assert position before a space or the end of the string

Remember to double the backslashes in Java strings.

Upvotes: 3

You should instead split by whitespace and use Double.parseDouble() on each fragment, then if it indeed is a double, compare it to 0.

String[] parts = numbers.split("\\s+");
int numZeros = 0;
for (String s: parts) {
    try {
        if (Double.parseDouble(s) == 0) {
            numZeros ++;
        }
    } 
    catch (Exception e) {
    }
}

There is no easy solution for the regex anyway. The easiest thought would be to use the \b boundary operator, but it fails badly. Also, the Double.parseDouble means that things like -0 are supported too.

Upvotes: 2

Related Questions