How expensive is String to Int to String conversion in Scala?

Question

I am attempting to check whether a particular string is an Int (or Boolean or Long so on and so forth) but don't really need it to be parsed it to an Int (or others). I can think of two options. First, to do a var.toInt.toString or just return the actual string on which toInt was invoked if it's an Int. While both work just fine, I wanted to know if the former is a lot more expnsive compared to the latter. FYI, I won't be parsing really long strings this way, but I will be going through terabytes of data.

Rex Kerr · Accepted Answer

First, to answer the exact question:

.toString takes a few tens of ns on my machines, with the details depending of course on the machine and also on how long the string representation of the int is (~2x difference between the shortest and longest). .toInt takes less (about 1/2 to 3/4 of the time of toString).

Does this matter? Well, over terabytes of data (mostly numeric, I'm guessing) you'll have trillions of tens of nanoseconds, or tens of thousands of seconds. So maybe it does; it'll be hours of extra compute time.

But that's not what's going to be happening. If you use toInt on something that is not an integer, you're going to throw an exception. Exceptions are really slow--usually at least a couple of microseconds. If more than about 1/1000 of your supposed ints are not actually ints, you'll be spending a huge fraction of your time creating big stack traces for your parsing exceptions and then throwing them away.

You can try to use a regex. That's generally about 10x more expensive than just doing the parse, but 10x cheaper than throwing the exception. Still not a good choice unless you have several extra compute days to throw at the problem, especially since the regex will only tell you if it's int-like, not if it's in range, so you have to parse it anyway (and catch exceptions or do fiddly bounds checking).

So if you really want it to be fast, you end up having to do the validation manually, indexing through the string, grabbing characters with charAt, and so on. Yes, it's a pain. But if you parse it yourself, it'll be about as fast as a single .toInt. It's a big ugly block of code. Using java.lang.Character.digit is generally the way to go if people might have number values in other character sets (you can catch the -1 return and bail). Don't forget to handle positive and negative slightly differently (due to the different range).

Addendum: you might think java.util.Scanner is just the ticket. It steps through data and has a hasNextInt method. Unfortunately, it's dreadfully slow.

See also What's the best way to check to see if a String represents an integer in Java? for Java answers to the question (none of which are idea IMO).

How expensive is String to Int to String conversion in Scala?

Answers (1)

Related Questions