Reputation: 13
My test assignment when hiring was to optimize this code:
someString.replaceAll("\"", "'").replaceAll("\n", "").replace("[]", "{}");
To solve this problem, I wrote this method:
public String outputValue(String input) {
char[] inputArray = input.toCharArray();
StringBuilder builder = new StringBuilder();
for (int i = 0; i < inputArray.length; i++) {
if (i != inputArray.length - 1 && inputArray[i] == '[' && inputArray[i + 1] == ']' ) {
builder.append("{}");
i++;
}
else if (inputArray[i] == '"')
builder.append("'");
else if (inputArray[i] == '\n' )
continue;
else builder.append(inputArray[i]);
}
return builder.toString();
}
I had many options for solving this problem, but this seemed the most optimal. I tested the speed using the Instant class, and on any line size it showed me high results. The firm told me that in their measurements there is degradation on large lines, that this is due to an overhead for using StringBuilder and unconditional allocation of a new line at the end.
So i didn't get a job.
What solution could be even more optimized? Even using a char array, I cannot do without string allocation String.valueOf().
Upvotes: 1
Views: 126
Reputation: 10151
Instead of using the StringBuilder you could modify the inputArray
directly and use it as result (possible, because your replaces can only shorten the char-sequence)
This would be my resulting version:
public static String outputValue(final String input) {
final char[] dataArray = input.toCharArray();
int destPos = 0;
int srcPos = 0;
final int len = dataArray.length;
while (srcPos < len) {
if ('"' == dataArray[srcPos]) {
dataArray[destPos++] = '\'';
srcPos++;
continue;
}
if ('\n' == dataArray[srcPos]) {
srcPos++;
continue;
}
if ('[' == dataArray[srcPos] && srcPos < len - 1 && ']' == dataArray[srcPos + 1]) {
dataArray[destPos++] = '{';
dataArray[destPos++] = '}';
srcPos += 2;
continue;
}
dataArray[destPos++] = dataArray[srcPos++];
}
return String.copyValueOf(dataArray, 0, destPos);
}
Thoughts about the code:
dataArray
(and the created return-String).Speed: I made rough speed test of the original, my own and the version of
rzwitserloot (with a long String dats-len: 268435456 characters
output-len 251658240
At least running from within eclipse, as JUnit-test,mine is faster than both the others.
Output of my test:
date-len 268435456
out-len 251658240
mr smith42: 1007
rzwitserloot: 1512
Orgiginal: 1950
mr smith42: 834
rzwitserloot: 1370
Orgiginal: 1870
mr smith42: 789
rzwitserloot: 1214
Orgiginal: 1945
mr smith42: 720
rzwitserloot: 1142
Orgiginal: 1572
mr smith42: 672
rzwitserloot: 1073
Orgiginal: 1739
mr smith42: 925
rzwitserloot: 1240
Orgiginal: 1823
mr smith42: 834
rzwitserloot: 1284
Orgiginal: 1676
mr smith42: 840
rzwitserloot: 1230
Orgiginal: 1621
mr smith42: 811
rzwitserloot: 1260
Orgiginal: 1695
mr smith42: 840
rzwitserloot: 1257
Orgiginal: 1707
mr smith42: 868
rzwitserloot: 1309
Orgiginal: 1716
mr smith42: 837
rzwitserloot: 1262
Orgiginal: 1683
mr smith42: 835
rzwitserloot: 1240
Orgiginal: 1632
mr smith42: 824
rzwitserloot: 1255
Orgiginal: 1635
mr smith42: 844
rzwitserloot: 1243
....
Here my rough test: (Using a real profiling tool should lead to similar results)
@Test
public void test() {
String data = "jksdhfewfnku\n[] [ {} \" [] }\nss";
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
data += data;
String expected = "jksdhfewfnku{} [ {} ' {} }ss";
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
expected += expected;
System.err.println("date-len " + data.length());
System.err.println("out-len " + expected.length());
// run it multible times to give the VM a chance for optimization
for (int i = 20; 0 < i; i--) {
final long t = System.currentTimeMillis();
Assert.assertEquals(expected, T.outputValue(data));
System.err.println("mr smith42: " + (System.currentTimeMillis() - t));
final long t2 = System.currentTimeMillis();
Assert.assertEquals(expected, T.outputValue2(data));
System.err.println("rzwitserloot: " + (System.currentTimeMillis() - t2));
final long t3 = System.currentTimeMillis();
Assert.assertEquals(expected, T.outputValueOrgiginal(data));
System.err.println("Orgiginal: " + (System.currentTimeMillis() - t3));
}
}
Upvotes: 1
Reputation:
These interview questions are more about what questions you ask the interviewer. The interviewer is the customer and you are the contractor in to solve their problem.
It's their system, you were asked to optimize. What questions are you going to ask them?
The other case is that they have no idea how to optimize that line of code and they are just running some tools someone (much more talented than they are) made to find the bottle neck.
You are there to give them free code to run through their tools. You aren't getting the job because there wasn't one to begin with. Shady? Yes. Done often in all industries and all disciplines? Yes.
Upvotes: 0
Reputation: 102812
StringBuilder builder = new StringBuilder();
StringBuilder has no idea how large the output is going to be, so you do get amortized growth cost here: SB isn't voodoo magic, it needs to guess at sizes. It'll guess 'eh, I'll give you 10 chars worth'. If you then add the 11th char, SB has to make a new char array (a larger one; I think it grows by a factor x2, might be x1.5), so it makes a 20-char array, copies over the first 10 chars, and tosses out the old array. This is not too bad (by definition this growth penalty is applied to an ever exponentially decreasing number of append
operations), but...
look at the input question: The output string is as large as the input string, OR smaller (every \n
in the input results in the output being 1 smaller). So, you can tell StringBuilder beforehand: Hey, just start out with this much capacity (the length of the input). Now there is no growth penalty anymore.
that this is due to an overhead for using StringBuilder
This is misleading or oversimplified; you may not have quite heard that correctly. StringBuilder is just an object, with a char array. That is overhead, yeah, but,
someString.replaceAll("\"", "'").replaceAll("\n", "").replace("[]", "{}");
that makes 2 pattern objects and (potentially) 3 strings. strings also are just an object with a big char array inside, so that's potentially more overhead and thus does not explain in the slightest why your take can be a lot slower.
However, there is another optimization you missed:
The replace
/replaceAll
methods don't make any string, AT ALL, if there is no need. Try it: "Hello!".replace("Foo", "Bar")
returns the same reference (a == b
equal).
Also, .toCharArray
does require yet another full clone, also pricey. Just use charAt
.
So, let's make an actual fast one:
public String outputValue(String input) {
StringBuilder builder = null;
int len = input.length();
for (int i = 0; i < len; i++) {
char c = input.charAt(i);
if (c != '[' && c != '\n' && c != '"') {
if (builder != null) builder.append(c);
continue;
}
if (builder == null) {
// Don't even make the builder until we determine we need it!
builder = new StringBuilder(len);
builder.append(input, 0, i);
}
if (i != leg - 1 && c == '[' && input.charAt(i + 1) == ']' ) {
builder.append("{}");
i++;
} else if (c == '"') {
builder.append('\'');
} else if (c != '\n') {
builder.append(c);
}
}
return builder == null ? input : builder.toString();
}
I tested the speed using the Instant class, and on any line size it showed me high results.
Not how you test performance. Instant represents system clock; it has only at best microsecond granularity (and usually not even that), and can and will change if the net time daemon changes the time on you. It can also be under smear (when a clock needs adjusting, because so much software out there assumes time can never run backwards, instead of just moving the time back, a smearing NTP will simply run time 'slower' - ticking up 750 milliseconds every second for example, until time's 'fixed').
Even nanoTime is not great, though - benchmarking is way more complicated than this. Use JMH (Java Microbenchmark Harness).
If you want to see some examples where your code does way worse than the original .replaceAll/replaceAll/replace
chain, make a bunch of strings that are large and which do not contain any of these symbols (no newlines, no quotes, no brackets). You'll see a very big difference.
But, interview questions aren't neccessarily about code optimization. For example, I find these just as plausible:
Change almost nothing; replace is reasonably fast. However, .replaceAll
involves a regex. If this line gets executed a ton, making an explicit pattern object would be more efficient and read better (though that is in the eye of the beholder). More generally, the patterns here don't use any regex features, so maybe the interviewer just wanted you to replace replaceAll
with replace
and that is all. To be clear, my code above will be faster if you have a large input with newlines and quotes and brackets in it, and will not be slower if there aren't. But, not every line of code needs to be optimized to the maximum possible extent.
Before digging into the code at all, the exercise tested your willingness to ask questions and look for the bigger picture first. Maybe they wanted you to ask: How often is this run? Is there a performance framework in place? What kinds of strings are usually thrown at this method? Can we perhaps look elsewhere and fix the strings 'at the source' instead of converting them every time? When I do an interview, I tend to craft technical questions that seem complete but which have some missing information once you dig down enough - if the interviewee doesn't ask, I dock them some points. It's not an instant fail - I'll tell em and see if they bother to ask and debate first before diving into programming for the next question.
Upvotes: 2