Kenneth Liu
Kenneth Liu

Reputation: 13

Better way to repeatedly use characters from a string in Java

I would like to use characters from a string for many times and wonder if it is better to use string.charAt() everytime I need a character, or save the char array with string.toCharArray() and use index to access character in array. So I wrote a simple benchmark program and I observed a significant performance difference.

static int[] loops = new int[]{10000, 100000, 1000000};

static void useCharAt(String s){
    int sum = 0;
    for(int loop : loops) {
        long start = System.currentTimeMillis();
        for (int i = 0; i < loop; i++) {
            for (int j = 0; j < s.length(); j++) {
                sum += s.charAt(j);
            }
        }
        System.out.println("string size is " + s.length() + ", loop size is "+loop+", charAt() costs " + (System.currentTimeMillis() - start) + " ms");
    }
}

static void useArray(String s){
    char[] arr= s.toCharArray();
    int sum = 0;
    for(int loop : loops) {
        long start = System.currentTimeMillis();
        for (int i = 0; i < loop; i++) {
            for (char c : arr) {
                sum += c;
            }
        }
        System.out.println("string size is " + s.length() + ", loop size is "+loop+", array costs " + (System.currentTimeMillis() - start) + " ms");
    }
}

public static void main(String[] args){
    StringBuilder sb = new StringBuilder();
    int strLen[] = new int[]{1000, 5000, 10000};
    for(int len : strLen) {
        sb.setLength(0);
        for(int i = 0; i < len; i++) sb.append('a');
        String s = sb.toString();
        useArray(s);
        useCharAt(s);
    }
}

and the result is

string size is 1000, loop size is 10000, array costs 10 ms
string size is 1000, loop size is 100000, array costs 60 ms
string size is 1000, loop size is 1000000, array costs 495 ms
string size is 1000, loop size is 10000, charAt() costs 14 ms
string size is 1000, loop size is 100000, charAt() costs 184 ms
string size is 1000, loop size is 1000000, charAt() costs 1649 ms

string size is 5000, loop size is 10000, array costs 23 ms
string size is 5000, loop size is 100000, array costs 232 ms
string size is 5000, loop size is 1000000, array costs 2277 ms
string size is 5000, loop size is 10000, charAt() costs 82 ms
string size is 5000, loop size is 100000, charAt() costs 828 ms
string size is 5000, loop size is 1000000, charAt() costs 8202 ms

string size is 10000, loop size is 10000, array costs 44 ms
string size is 10000, loop size is 100000, array costs 458 ms
string size is 10000, loop size is 1000000, array costs 4559 ms
string size is 10000, loop size is 10000, charAt() costs 166 ms
string size is 10000, loop size is 100000, charAt() costs 1626 ms
string size is 10000, loop size is 1000000, charAt() costs 16280 ms

I wonder why charAt() is slower than direct access with array? I checked the implementation of chatAt() and I see no difference with array direct access method.

public char charAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return value[index];
}

Upvotes: 1

Views: 81

Answers (1)

Lothar
Lothar

Reputation: 5449

Using toCharArray() comes with an initial cost where the String's internal array is copied.

From then on it's a simple access to an array (with implicit boundary checks that happen in charAt() as well when the value is returned). Calls of charAt() come with the cost of a function call and the duplicated boundary check (to throw a StringIndexOutOfBoundsException instead of an ArrayIndexOutOfBoundsException).

This effect is very well known and already mentioned in early Java Performance books.

In short: If you only access a single character in a String, you're better off with charAt(). If you access more or all the characters and the String is potentially longer, you're better off with toCharArray() and go through the array instead.

Upvotes: 2

Related Questions