Reputation: 1677
I need to compare a String(which is a valid integer) with an int value.
For String str
, int integer
, my options are :
Integer.parseInt(str) == integer
str.equals(Integer.toString(integer))
Which one is faster and is there any better way?
Following is inspired by EqualsIntString of atlaste's answer
private static boolean checkEquality(final String string, final long value) {
if (string == null) {
return false;
}
final int length = string.length();
if (length == 0) {
return false;
}
long absValue;
final byte minIndex;
if (string.charAt(0) == '-') {
if (value > 0) {
return false;
}
absValue = -value;
minIndex = 1;
} else {
if (value < 0) {
return false;
}
absValue = value;
if (string.charAt(0) == '+') {
minIndex = 1;
} else {
minIndex = 0;
}
}
for (int idx = length - 1; idx >= minIndex; idx--) {
final byte rem = (byte) (absValue % 10);
absValue /= 10;
final int diff = string.charAt(idx) - rem - 48;
if (diff != 0) {
return false;
}
}
return absValue == 0;
}
Upvotes: 6
Views: 1487
Reputation: 328568
Only one way to know: measure.
A quick JMH benchmark shows that Integer.parseInt
is quicker, regardless of the magnitude of the integer. But we are talking about 10 nanoseconds or so, and it's unlikely to make much difference at your application level.
If you are 100% sure that your string is a valid integer, you can also parse it manually, which will be even faster (see the parseManual
and the equalsIntString
methods in the code below). Which method to use also depends on whether you expect the values to be often equal or often different (if they are often equal, parseManual
works better, if you expect them to be different on average, equalsIntString
is faster).
So the characteristics of your dataset also matters in the decision.
Full results (score is in nanoseconds per operations: smaller is better) - the (n)
column is the integer that is being compared. The first part copmares equal values and the second part compares unequal values.
Benchmark (n) Mode Samples Score Error Units
c.a.p.SO30507506.manual 1 avgt 10 6.579 ± 0.131 ns/op
c.a.p.SO30507506.manual 12345 avgt 10 10.017 ± 0.401 ns/op
c.a.p.SO30507506.manual 123456789 avgt 10 12.490 ± 0.243 ns/op
c.a.p.SO30507506.manualAtlaste 1 avgt 10 7.914 ± 0.144 ns/op
c.a.p.SO30507506.manualAtlaste 12345 avgt 10 15.902 ± 0.593 ns/op
c.a.p.SO30507506.manualAtlaste 123456789 avgt 10 28.117 ± 0.463 ns/op
c.a.p.SO30507506.parse 1 avgt 10 8.495 ± 0.325 ns/op
c.a.p.SO30507506.parse 12345 avgt 10 21.614 ± 0.564 ns/op
c.a.p.SO30507506.parse 123456789 avgt 10 34.692 ± 0.572 ns/op
c.a.p.SO30507506.stringEquals 1 avgt 10 21.597 ± 0.594 ns/op
c.a.p.SO30507506.stringEquals 12345 avgt 10 36.948 ± 1.144 ns/op
c.a.p.SO30507506.stringEquals 123456789 avgt 10 44.444 ± 1.011 ns/op
c.a.p.SO30507506.manual_unequal 1 avgt 10 7.011 ± 0.149 ns/op
c.a.p.SO30507506.manual_unequal 12345 avgt 10 10.244 ± 0.350 ns/op
c.a.p.SO30507506.manual_unequal 123456789 avgt 10 13.135 ± 0.797 ns/op
c.a.p.SO30507506.manualAtlaste_unequal 1 avgt 10 4.328 ± 0.111 ns/op
c.a.p.SO30507506.manualAtlaste_unequal 12345 avgt 10 4.359 ± 0.115 ns/op
c.a.p.SO30507506.manualAtlaste_unequal 123456789 avgt 10 4.339 ± 0.103 ns/op
c.a.p.SO30507506.parse_unequal 1 avgt 10 8.304 ± 0.251 ns/op
c.a.p.SO30507506.parse_unequal 12345 avgt 10 21.514 ± 0.405 ns/op
c.a.p.SO30507506.parse_unequal 123456789 avgt 10 35.257 ± 1.043 ns/op
c.a.p.SO30507506.stringEquals_unequal 1 avgt 10 19.060 ± 0.162 ns/op
c.a.p.SO30507506.stringEquals_unequal 12345 avgt 10 31.829 ± 0.427 ns/op
c.a.p.SO30507506.stringEquals_unequal 123456789 avgt 10 41.870 ± 0.252 ns/op
Code:
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
public class SO30507506 {
@Param({"1", "12345", "123456789"}) int n;
int i;
String s;
@Setup public void setup() {
i = n;
s = String.valueOf(n);
}
@Benchmark public boolean parse() {
return Integer.parseInt(s) == i;
}
@Benchmark public boolean stringEquals() {
return s.equals(Integer.toString(i));
}
@Benchmark public boolean manual() {
return parseManual(s) == i;
}
@Benchmark public boolean manualAtlaste() {
return equalsIntString(i, s);
}
@Benchmark public boolean parse_unequal() {
return Integer.parseInt(s) == i * 2;
}
@Benchmark public boolean stringEquals_unequal() {
return s.equals(Integer.toString(i * 2));
}
@Benchmark public boolean manual_unequal() {
return parseManual(s) == i * 2;
}
@Benchmark public boolean manualAtlaste_unequal() {
return equalsIntString(i * 2, s);
}
private static int parseManual(String s) {
int result = 0;
int sign = s.charAt(0) == '-' ? -1 : 1;
int startIndex = (s.charAt(0) >= '0' && s.charAt(0) <= '9') ? 0 : 1;
for (int i = startIndex; i < s.length(); i++) {
result *= 10;
result += s.charAt(i) - '0';
}
return result * sign;
}
private static boolean equalsIntString(int value, String s) {
if (s.isEmpty()) return false; // This is never good.
if ((s.charAt(0) == '-' && value >= 0) || (s.charAt(0) != '-' && value < 0)) return false; // positive/negative check
// Define the limit. This is basically the end of the string to check.
int limit = 0;
if (value < 0) {
limit = 1;
value = -value;
}
for (int i = s.length() - 1; i >= limit; --i) {
char expected = (char) ('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
value /= 10; // same story.
if (s.charAt(i) != expected) return false;
}
return true;
}
}
Upvotes: 5
Reputation: 31106
Interesting question. For clarity, there's a lot you can do wrong with testing this, because Java does stuff with strings that might influence the results. So let's start with building a proper test.
Constructing your test
Specifically: a proper test doesn't rely on the loadstring, because that influences memory allocation. You want to make a test using dynamically constructed strings.
The 10-log of your integer (e.g. the length of the string) will influence the test outcome. The longer the string, the longer Integer.tryParse
will take. If you have a longer string, it will need to calculate more div/mul and take longer. An additional thing that influences performance is the '-' sign. If you have unsigned integers, this should be taken into account.
Basically measuring means:
Be sure to make a huge array for this during the test, so that your tests won't be influenced. Also make sure that the integers / random numbers that you use have the same characteristics as your data... Because of this, I cannot execute the tests for you, so I'll just stick to the theory.
String to integer equality
It helps to know how string to integer conversion works, so let's start with a blunt solution and work our way up. I currently don't have Java on my laptop, so I'm sorry for the C# syntax :-) You should be easily able to fix it though...
public int ConvertStringToInt(string s)
{
int val = 0;
if (s[0] == '-') // 1
{
for (int i = 1; i < s.Length; ++i )
{
if (s[i] >= '0' && s[i] <= '9') // 2
{
throw new Exception();
}
val = val * 10 + s[i] - '0';
}
return -val;
}
else
{
for (int i = 0; i < s.Length; ++i)
{
if (s[i] >= '0' && s[i] <= '9')
{
throw new Exception();
}
val = val * 10 + s[i] - '0';
}
return val;
}
}
If you know for a fact that the numbers in the string are never negative, you can of course drop the condition 1. Also, if you know for a fact that the string is always a number (which is IMO implied in your question), you can optimize 2. A little trick that I usually use is to use arithmetic overflows to generate large unsigned numbers, which in turn removes an additional condition from 2. You'll end up with:
public int ConvertStringToInt(string s)
{
int val = 0;
if (s[0] == '-')
{
for (int i = 1; i < s.Length; ++i )
{
val = val * 10 + s[i] - '0';
}
return -val;
}
else
{
for (int i = 0; i < s.Length; ++i)
{
val = val * 10 + s[i] - '0';
}
return val;
}
}
Next up, you want equality instead of conversion. So, how lazy can we evaluate this? Well, we need to parse pretty much all of the string before we can do the check. The only thing we know is that if we encounter a '-' char, we also need a negative integer. I ended up with this:
public bool EqualsStringInt(string s, int value)
{
int val = 0;
if (s[0] == '-')
{
if (value >= 0) { return false; } // otherwise we expected another char
for (int i = 1; i < s.Length; ++i )
{
val = val * 10 + s[i] - '0'; // s[i] must be a char between '0'-'9' as implied by the question.
}
return (-val) == value;
}
else
{
if (value < 0) { return false; } // otherwise we expected another char
for (int i = 0; i < s.Length; ++i)
{
val = val * 10 + s[i] - '0';
}
return val == value;
}
}
Integer to string equality
I've written a bit of code in the past for C++ that converts integers to strings here: C++ performance challenge: integer to std::string conversion . There are some good solutions here as well that might be worth considering if you're really looking for performance.
Just checking equality is easier than that though. If you look closely at the algorithm, you'll notice:
Both of these should be time consuming in the long run, and both of them will influence your tests.
At this point, it's interesting to note that you don't really need the complete string though - you just need a single character. So, let's work with that:
Or, in code:
public bool EqualsIntString(int value, string s)
{
if (s.Length == 0) { return false; } // This is never good.
if ((s[0] == '-' && value >= 0) || (s[0] != '-' && value < 0)) { return false; } // positive/negative check
// Define the limit. This is basically the end of the string to check.
int limit = 0;
if (value < 0) // 1
{
limit = 1;
value = -value;
}
for (int i=s.Length-1; i>=limit; --i)
{
char expected = (char)('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
value /= 10; // same story.
if (s[i] != expected) { return false; }
}
return true;
}
Again, if you don't have negative numbers, do the obvious optimization. by removing 1.
Can you do even faster? Well yes... that's why I posted the C++ link in the first place. Most of these algorithms can easily be adjusted for this 'equality' case.
Optional optimizations for the last solution
You can use a 10log to determine the length of the string. This implies a lower and an upper bound value to the integer. A simple lookup table can do this for you. However, 10log is quite slow if not properly implemented, so be sure to test this!
Which one is faster
Construct a proper test, and test it. I tried to test it here, but don't have the characteristics of your data, which I expect to make a difference.
Of course, if you don't need such blunt performance, use the standard implementations and equals, and test it.
Upvotes: 1
Reputation: 46382
I'd bet that
Integer.parseInt(str) == integer
is faster. All it takes is one multiplication and one addition per digit, while Integer.toString
needs division and modulus per digit - the slowest int arithmetic operations.
Actually, there are some optimizations reducing the number of operations in Integer.toString
, but the speed ratio is just too big.
Moreover, there's memory allocation for the new string.
OTOH, Integer.parseInt
accepts all Unicode digits, not only 0 to 9, which slows it down a lot. I guess, Guava's Ints.parseInt
is way faster.
In case the string can't be parsed, then you'd get a NumberFormatException
. You can catch it, but it costs a lot of time and then Integer.toString
is the winner.
Upvotes: 0
Reputation: 6390
I would go for the Interger.Parse(), which is likely to be more resilient in the face of different localisation cultures etc than the other approach...
Speed isn't really the issue - valid results are.
Upvotes: -1