Reputation: 5306
How can I check whether a string represents a long, a double, or just a regular string? I need to do this because this value needs to be indexed in a database according to its type. Currently I'm doing this by trying to parse the string and checking for exceptions but since the code is called very frequently, I'm wondering if there's a more efficient way to do it. My code currently looks like this:
String value = ...;
// For example, could be "213678", "654.1236781", or "qwerty12345"
try {
Long longValue = Long.parseLong(value);
// Index 'longValue' in the database
} catch (NumberFormatException parseLongException) {
try {
Double doubleValue = Double.parseDouble(value);
// Index 'doubleValue' in the database
} catch (NumberFormatException parseDoubleException) {
// Index 'value' in the database
}
}
EDIT:
I just did a quick benchmarking exercise as per @user949300's suggestion to use regex patterns and it performed slightly better than the exception handling code above. Here's the code in case someone else finds it useful:
Pattern longPattern = Pattern.compile("^[-+]?[0-9]+$");
Pattern doublePattern = Pattern.compile("^[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?$");
// Check for long regex pattern before the double regex pattern
// since the former is a strict subset of the latter
if (longPattern.matcher(value).matches()) {
// Perform indexing for long in the database
} else if (doublePattern.matcher(value).matches()) {
// Perform indexing for double in the database
} else {
// Perform indexing for string in the database
}
Here are the benchmarking results for checking 50,000 entries where the approximate breakdown of types is 50% longs, 10% doubles, 40% strings (representative of the workload that my application processes):
--- Exception handling code ---
STRING - actual: 19861, found: 19861
DOUBLE - actual: 4942, found: 4942
LONG - actual: 25197, found: 25197
Time taken: 2561 ms
--- Regex pattern matching code ---
STRING - actual: 19861, found: 19861
DOUBLE - actual: 4942, found: 4942
LONG - actual: 25197, found: 25197
Time taken: 1565 ms
Upvotes: 7
Views: 739
Reputation: 11017
If you don't need to worry about your Longs
being negative, you can probably use NumberUtils.isDigits()
and NumberUtils.isNumber()
from the Apache Commons Lang library.
if(NumberUtils.isDidgets(string)){
//Index long
} else if(NumberUtils.isNumber(string)){
//Index double
} else {
//Index string
}
Upvotes: 0
Reputation: 111259
One possibility is java.io.StreamTokenizer:
Reader r = new StringReader(value);
StreamTokenizer st = new StreamTokenizer(r);
int tokenType = st.nextToken();
double number;
String word;
switch (tokenType) {
case StreamTokenizer.TT_NUMBER: // it's a number
number = st.nval; break;
case StreamTokenizer.TT_WORD: // it's a string
word = st.sval; break;
}
It can be kind of tricky to use though.
Upvotes: 1
Reputation: 15729
Have you considered regular expressions?
If the String contains anything other than - (at the beginning), and 0-9 or ., it is a String. (Note - this ignores internationalization and scientific notation - are they issues?)
Otherwise, it it contains a ., it is a double. (Well, you should test for only a single ., but this is a start)
Otherwise, it is a long.
Out of paranoia, I still might check for Exceptions, but that might be a faster way.
NOTE ADDED I'm guessing that testing the regex is faster than throwing exceptions out of the various parse routines, but this might not actually be true. You should do some tests.
Upvotes: 3
Reputation: 31182
Your code looks good.
Do some profiling, and if based on it you find your code too slow, then you can think about potential optimizations (like simple loop to see if all characters are digits).
Do not try to optimize your code before profiling. Especially in languages like java.
Upvotes: 1
Reputation: 42607
You might be able to get some improvement (especially if you can rule out scientific notation e.g. 1e12
) by just checking for non-digits to detect a long.
Long.parseLong()
delegates to a general method that works in any number base, so a decimal-only method might be a bit faster.
Don't forget minus signs, if these are possible in your data...
Doubles are harder because 654.1236871
is valid, but 654.12.36.87...1
is not, though they contain the same set of characters. So full parsing is probably needed.
Upvotes: 1
Reputation: 9249
As far as I know there's no elegant way to do this other than that. I would recommend that you parse them in the order of most common to least common so as to make this as quick as possible.
If you've got more than 3 possible types you're going to have a deep and ugly try-catch
nest, but technically it will be faster than if you broke out each parse attempt into its own method; the tradeoff here is whether you want code clarity or faster execution - it sounds like you might want the latter.
Upvotes: 2