lab bhattacharjee
lab bhattacharjee

Reputation: 1677

String to int OR int to String : which is faster?

I need to compare a String(which is a valid integer) with an int value.

For String str, int integer, my options are :

  1. Integer.parseInt(str) == integer

  2. str.equals(Integer.toString(integer))

Which one is faster and is there any better way?

Following is inspired by EqualsIntString of atlaste's answer

 private static boolean checkEquality(final String string, final long value) {

    if (string == null) {
        return false;
    }
    final int length = string.length();
    if (length == 0) {
        return false;
    }

    long absValue;
    final byte minIndex;
    if (string.charAt(0) == '-') {
        if (value > 0) {
            return false;
        }
        absValue = -value;
        minIndex = 1;
    } else {
        if (value < 0) {
            return false;
        }
        absValue = value;
        if (string.charAt(0) == '+') {
            minIndex = 1;
        } else {
            minIndex = 0;
        }
    }

    for (int idx = length - 1; idx >= minIndex; idx--) {
        final byte rem = (byte) (absValue % 10);
        absValue /= 10;
        final int diff = string.charAt(idx) - rem - 48;
        if (diff != 0) {
            return false;
        }
    }

    return absValue == 0;
}

Upvotes: 6

Views: 1487

Answers (4)

assylias
assylias

Reputation: 328568

Only one way to know: measure.

A quick JMH benchmark shows that Integer.parseInt is quicker, regardless of the magnitude of the integer. But we are talking about 10 nanoseconds or so, and it's unlikely to make much difference at your application level.

If you are 100% sure that your string is a valid integer, you can also parse it manually, which will be even faster (see the parseManual and the equalsIntString methods in the code below). Which method to use also depends on whether you expect the values to be often equal or often different (if they are often equal, parseManual works better, if you expect them to be different on average, equalsIntString is faster).

So the characteristics of your dataset also matters in the decision.

Full results (score is in nanoseconds per operations: smaller is better) - the (n) column is the integer that is being compared. The first part copmares equal values and the second part compares unequal values.

Benchmark                                       (n)  Mode  Samples   Score   Error  Units
c.a.p.SO30507506.manual                           1  avgt       10   6.579 ± 0.131  ns/op
c.a.p.SO30507506.manual                       12345  avgt       10  10.017 ± 0.401  ns/op
c.a.p.SO30507506.manual                   123456789  avgt       10  12.490 ± 0.243  ns/op
c.a.p.SO30507506.manualAtlaste                    1  avgt       10   7.914 ± 0.144  ns/op
c.a.p.SO30507506.manualAtlaste                12345  avgt       10  15.902 ± 0.593  ns/op
c.a.p.SO30507506.manualAtlaste            123456789  avgt       10  28.117 ± 0.463  ns/op
c.a.p.SO30507506.parse                            1  avgt       10   8.495 ± 0.325  ns/op
c.a.p.SO30507506.parse                        12345  avgt       10  21.614 ± 0.564  ns/op
c.a.p.SO30507506.parse                    123456789  avgt       10  34.692 ± 0.572  ns/op
c.a.p.SO30507506.stringEquals                     1  avgt       10  21.597 ± 0.594  ns/op
c.a.p.SO30507506.stringEquals                 12345  avgt       10  36.948 ± 1.144  ns/op
c.a.p.SO30507506.stringEquals             123456789  avgt       10  44.444 ± 1.011  ns/op

c.a.p.SO30507506.manual_unequal                   1  avgt       10   7.011 ± 0.149  ns/op
c.a.p.SO30507506.manual_unequal               12345  avgt       10  10.244 ± 0.350  ns/op
c.a.p.SO30507506.manual_unequal           123456789  avgt       10  13.135 ± 0.797  ns/op
c.a.p.SO30507506.manualAtlaste_unequal            1  avgt       10   4.328 ± 0.111  ns/op
c.a.p.SO30507506.manualAtlaste_unequal        12345  avgt       10   4.359 ± 0.115  ns/op
c.a.p.SO30507506.manualAtlaste_unequal    123456789  avgt       10   4.339 ± 0.103  ns/op
c.a.p.SO30507506.parse_unequal                    1  avgt       10   8.304 ± 0.251  ns/op
c.a.p.SO30507506.parse_unequal                12345  avgt       10  21.514 ± 0.405  ns/op
c.a.p.SO30507506.parse_unequal            123456789  avgt       10  35.257 ± 1.043  ns/op
c.a.p.SO30507506.stringEquals_unequal             1  avgt       10  19.060 ± 0.162  ns/op
c.a.p.SO30507506.stringEquals_unequal         12345  avgt       10  31.829 ± 0.427  ns/op
c.a.p.SO30507506.stringEquals_unequal     123456789  avgt       10  41.870 ± 0.252  ns/op

Code:

@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
public class SO30507506 {

  @Param({"1", "12345", "123456789"}) int n;
  int i;
  String s;

  @Setup public void setup() {
    i = n;
    s = String.valueOf(n);
  }

  @Benchmark public boolean parse() {
    return Integer.parseInt(s) == i;
  }

  @Benchmark public boolean stringEquals() {
    return s.equals(Integer.toString(i));
  }

  @Benchmark public boolean manual() {
    return parseManual(s) == i;
  }

  @Benchmark public boolean manualAtlaste() {
    return equalsIntString(i, s);
  }

  @Benchmark public boolean parse_unequal() {
    return Integer.parseInt(s) == i * 2;
  }

  @Benchmark public boolean stringEquals_unequal() {
    return s.equals(Integer.toString(i * 2));
  }

  @Benchmark public boolean manual_unequal() {
    return parseManual(s) == i * 2;
  }

  @Benchmark public boolean manualAtlaste_unequal() {
    return equalsIntString(i * 2, s);
  }

  private static int parseManual(String s) {
    int result = 0;
    int sign = s.charAt(0) == '-' ? -1 : 1;
    int startIndex = (s.charAt(0) >= '0' && s.charAt(0) <= '9') ? 0 : 1;
    for (int i = startIndex; i < s.length(); i++) {
      result *= 10;
      result += s.charAt(i) - '0';
    }
    return result * sign;
  }

  private static boolean equalsIntString(int value, String s) {
    if (s.isEmpty()) return false; // This is never good.
    if ((s.charAt(0) == '-' && value >= 0) || (s.charAt(0) != '-' && value < 0)) return false; // positive/negative check

    // Define the limit. This is basically the end of the string to check.
    int limit = 0;
    if (value < 0) {
      limit = 1;
      value = -value;
    }

    for (int i = s.length() - 1; i >= limit; --i) {
      char expected = (char) ('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
      value /= 10; // same story.

      if (s.charAt(i) != expected) return false;
    }
    return true;
  }
}

Upvotes: 5

atlaste
atlaste

Reputation: 31106

Interesting question. For clarity, there's a lot you can do wrong with testing this, because Java does stuff with strings that might influence the results. So let's start with building a proper test.

Constructing your test

Specifically: a proper test doesn't rely on the loadstring, because that influences memory allocation. You want to make a test using dynamically constructed strings.

The 10-log of your integer (e.g. the length of the string) will influence the test outcome. The longer the string, the longer Integer.tryParse will take. If you have a longer string, it will need to calculate more div/mul and take longer. An additional thing that influences performance is the '-' sign. If you have unsigned integers, this should be taken into account.

Basically measuring means:

  • Create strings with the proper length (depends on your data!!!). More strings = better
  • Create fail/pass an integer array that matches (or doesn't) with the string array.
  • Garbage collect.
  • Test with those two arrays.

Be sure to make a huge array for this during the test, so that your tests won't be influenced. Also make sure that the integers / random numbers that you use have the same characteristics as your data... Because of this, I cannot execute the tests for you, so I'll just stick to the theory.

String to integer equality

It helps to know how string to integer conversion works, so let's start with a blunt solution and work our way up. I currently don't have Java on my laptop, so I'm sorry for the C# syntax :-) You should be easily able to fix it though...

public int ConvertStringToInt(string s)
{
    int val = 0;
    if (s[0] == '-') // 1
    {
        for (int i = 1; i < s.Length; ++i )
        {
            if (s[i] >= '0' && s[i] <= '9') // 2
            {
                throw new Exception();
            }
            val = val * 10 + s[i] - '0';
        }
        return -val;
    }
    else
    {
        for (int i = 0; i < s.Length; ++i)
        {
            if (s[i] >= '0' && s[i] <= '9')
            {
                throw new Exception();
            }
            val = val * 10 + s[i] - '0';
        }
        return val;
    }
}

If you know for a fact that the numbers in the string are never negative, you can of course drop the condition 1. Also, if you know for a fact that the string is always a number (which is IMO implied in your question), you can optimize 2. A little trick that I usually use is to use arithmetic overflows to generate large unsigned numbers, which in turn removes an additional condition from 2. You'll end up with:

public int ConvertStringToInt(string s)
{
    int val = 0;
    if (s[0] == '-')
    {
        for (int i = 1; i < s.Length; ++i )
        {
            val = val * 10 + s[i] - '0';
        }
        return -val;
    }
    else
    {
        for (int i = 0; i < s.Length; ++i)
        {
            val = val * 10 + s[i] - '0';
        }
        return val;
    }
}

Next up, you want equality instead of conversion. So, how lazy can we evaluate this? Well, we need to parse pretty much all of the string before we can do the check. The only thing we know is that if we encounter a '-' char, we also need a negative integer. I ended up with this:

public bool EqualsStringInt(string s, int value)
{
    int val = 0;
    if (s[0] == '-')
    {
        if (value >= 0) { return false; } // otherwise we expected another char

        for (int i = 1; i < s.Length; ++i )
        {
            val = val * 10 + s[i] - '0'; // s[i] must be a char between '0'-'9' as implied by the question.
        }
        return (-val) == value;
    }
    else
    {
        if (value < 0) { return false; } // otherwise we expected another char

        for (int i = 0; i < s.Length; ++i)
        {
            val = val * 10 + s[i] - '0';
        }
        return val == value;
    }
}

Integer to string equality

I've written a bit of code in the past for C++ that converts integers to strings here: C++ performance challenge: integer to std::string conversion . There are some good solutions here as well that might be worth considering if you're really looking for performance.

Just checking equality is easier than that though. If you look closely at the algorithm, you'll notice:

  • Buffer overallocation. You don't need that. your tests WILL go wrong here if you don't wait for the GC and/or use static strings to seed the process!
  • Buffer reallocation. If you've filled the buffer sequentially, you need to invert it as well. If you don't want for GC, this will influence the test outcome!

Both of these should be time consuming in the long run, and both of them will influence your tests.

At this point, it's interesting to note that you don't really need the complete string though - you just need a single character. So, let's work with that:

  • Equality fails if the sign doesn't match
  • Equality fails if the first character doesn't match
  • Equality succeeds if all characters that are generated are the same.

Or, in code:

public bool EqualsIntString(int value, string s)
{
    if (s.Length == 0) { return false; } // This is never good.
    if ((s[0] == '-' && value >= 0) || (s[0] != '-' && value < 0)) { return false; } // positive/negative check

    // Define the limit. This is basically the end of the string to check.
    int limit = 0;
    if (value < 0) // 1
    {
        limit = 1;
        value = -value;
    }

    for (int i=s.Length-1; i>=limit; --i)
    {
        char expected = (char)('0' + (value % 10)); // the modulo will be optimized by the JIT because 10 is a constant
        value /= 10; // same story.

        if (s[i] != expected) { return false; }
    }
    return true;
}

Again, if you don't have negative numbers, do the obvious optimization. by removing 1.

Can you do even faster? Well yes... that's why I posted the C++ link in the first place. Most of these algorithms can easily be adjusted for this 'equality' case.

Optional optimizations for the last solution

You can use a 10log to determine the length of the string. This implies a lower and an upper bound value to the integer. A simple lookup table can do this for you. However, 10log is quite slow if not properly implemented, so be sure to test this!

Which one is faster

Construct a proper test, and test it. I tried to test it here, but don't have the characteristics of your data, which I expect to make a difference.

Of course, if you don't need such blunt performance, use the standard implementations and equals, and test it.

Upvotes: 1

maaartinus
maaartinus

Reputation: 46382

I'd bet that

Integer.parseInt(str) == integer

is faster. All it takes is one multiplication and one addition per digit, while Integer.toString needs division and modulus per digit - the slowest int arithmetic operations.

Actually, there are some optimizations reducing the number of operations in Integer.toString, but the speed ratio is just too big.

Moreover, there's memory allocation for the new string.

OTOH, Integer.parseInt accepts all Unicode digits, not only 0 to 9, which slows it down a lot. I guess, Guava's Ints.parseInt is way faster.


In case the string can't be parsed, then you'd get a NumberFormatException. You can catch it, but it costs a lot of time and then Integer.toString is the winner.

Upvotes: 0

Martin Milan
Martin Milan

Reputation: 6390

I would go for the Interger.Parse(), which is likely to be more resilient in the face of different localisation cultures etc than the other approach...

Speed isn't really the issue - valid results are.

Upvotes: -1

Related Questions