Dawood
Dawood

Reputation: 5306

How to check if and what type of number a string represents?

How can I check whether a string represents a long, a double, or just a regular string? I need to do this because this value needs to be indexed in a database according to its type. Currently I'm doing this by trying to parse the string and checking for exceptions but since the code is called very frequently, I'm wondering if there's a more efficient way to do it. My code currently looks like this:

String value = ...;
// For example, could be "213678", "654.1236781", or "qwerty12345"

try {
    Long longValue = Long.parseLong(value);
    // Index 'longValue' in the database
} catch (NumberFormatException parseLongException) {
    try {
        Double doubleValue = Double.parseDouble(value);
        // Index 'doubleValue' in the database
    } catch (NumberFormatException parseDoubleException) {
        // Index 'value' in the database
    }
}

EDIT:

I just did a quick benchmarking exercise as per @user949300's suggestion to use regex patterns and it performed slightly better than the exception handling code above. Here's the code in case someone else finds it useful:

Pattern longPattern = Pattern.compile("^[-+]?[0-9]+$");
Pattern doublePattern = Pattern.compile("^[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?$");

// Check for long regex pattern before the double regex pattern
// since the former is a strict subset of the latter
if (longPattern.matcher(value).matches()) {
    // Perform indexing for long in the database
} else if (doublePattern.matcher(value).matches()) {
    // Perform indexing for double in the database
} else {
    // Perform indexing for string in the database
}

Here are the benchmarking results for checking 50,000 entries where the approximate breakdown of types is 50% longs, 10% doubles, 40% strings (representative of the workload that my application processes):

--- Exception handling code ---
STRING - actual: 19861, found: 19861
DOUBLE - actual: 4942, found: 4942
LONG - actual: 25197, found: 25197
Time taken: 2561 ms

--- Regex pattern matching code ---
STRING - actual: 19861, found: 19861
DOUBLE - actual: 4942, found: 4942
LONG - actual: 25197, found: 25197
Time taken: 1565 ms

Upvotes: 7

Views: 739

Answers (6)

ILMTitan
ILMTitan

Reputation: 11017

If you don't need to worry about your Longs being negative, you can probably use NumberUtils.isDigits() and NumberUtils.isNumber() from the Apache Commons Lang library.

if(NumberUtils.isDidgets(string)){
    //Index long
} else if(NumberUtils.isNumber(string)){
    //Index double
} else {
    //Index string
}

Upvotes: 0

Joni
Joni

Reputation: 111259

One possibility is java.io.StreamTokenizer:

Reader r = new StringReader(value);
StreamTokenizer st = new StreamTokenizer(r);
int tokenType = st.nextToken();
double number;
String word;
switch (tokenType) {
    case StreamTokenizer.TT_NUMBER: // it's a number
         number = st.nval; break;
    case StreamTokenizer.TT_WORD: // it's a string
         word = st.sval; break;
}

It can be kind of tricky to use though.

Upvotes: 1

user949300
user949300

Reputation: 15729

Have you considered regular expressions?

If the String contains anything other than - (at the beginning), and 0-9 or ., it is a String. (Note - this ignores internationalization and scientific notation - are they issues?)

Otherwise, it it contains a ., it is a double. (Well, you should test for only a single ., but this is a start)

Otherwise, it is a long.

Out of paranoia, I still might check for Exceptions, but that might be a faster way.

NOTE ADDED I'm guessing that testing the regex is faster than throwing exceptions out of the various parse routines, but this might not actually be true. You should do some tests.

Upvotes: 3

Michał Šrajer
Michał Šrajer

Reputation: 31182

Your code looks good.

Do some profiling, and if based on it you find your code too slow, then you can think about potential optimizations (like simple loop to see if all characters are digits).

Do not try to optimize your code before profiling. Especially in languages like java.

Upvotes: 1

DNA
DNA

Reputation: 42607

You might be able to get some improvement (especially if you can rule out scientific notation e.g. 1e12) by just checking for non-digits to detect a long.

Long.parseLong() delegates to a general method that works in any number base, so a decimal-only method might be a bit faster.

Don't forget minus signs, if these are possible in your data...

Doubles are harder because 654.1236871 is valid, but 654.12.36.87...1 is not, though they contain the same set of characters. So full parsing is probably needed.

Upvotes: 1

Nate W.
Nate W.

Reputation: 9249

As far as I know there's no elegant way to do this other than that. I would recommend that you parse them in the order of most common to least common so as to make this as quick as possible.

If you've got more than 3 possible types you're going to have a deep and ugly try-catch nest, but technically it will be faster than if you broke out each parse attempt into its own method; the tradeoff here is whether you want code clarity or faster execution - it sounds like you might want the latter.

Upvotes: 2

Related Questions