Ólafur Waage
Ólafur Waage

Reputation: 69991

What is an efficient way to replace many characters in a string?

String handling in Java is something I'm trying to learn to do well. Currently I want to take in a string and replace any characters I find.

Here is my current inefficient (and kinda silly IMO) function. It was written to just work.

public String convertWord(String word)
{
    return word.toLowerCase().replace('á', 'a')
                             .replace('é', 'e')
                             .replace('í', 'i')
                             .replace('ú', 'u')
                             .replace('ý', 'y')
                             .replace('ð', 'd')
                             .replace('ó', 'o')
                             .replace('ö', 'o')
                             .replaceAll("[-]", "")
                             .replaceAll("[.]", "")
                             .replaceAll("[/]", "")
                             .replaceAll("[æ]", "ae")
                             .replaceAll("[þ]", "th");
}

I ran 1.000.000 runs of it and it took 8182ms. So how should I proceed in changing this function to make it more efficient?

Solution found:

Converting the function to this

public String convertWord(String word)
{
    StringBuilder sb = new StringBuilder();

    char[] charArr = word.toLowerCase().toCharArray();

    for(int i = 0; i < charArr.length; i++)
    {
        // Single character case
        if(charArr[i] == 'á')
        {
            sb.append('a');
        }
        // Char to two characters
        else if(charArr[i] == 'þ')
        {
            sb.append("th");
        }
        // Remove
        else if(charArr[i] == '-')
        {
        }
        // Base case
        else
        {   
            sb.append(word.charAt(i));
        }
    }

    return sb.toString();
}

Running this function 1.000.000 times takes 518ms. So I think that is efficient enough. Thanks for the help guys :)

Upvotes: 29

Views: 21573

Answers (9)

fabiolimace
fabiolimace

Reputation: 1197

I just implemented this utility class that replaces a char or a group of chars of a String. It is equivalent to bash tr and perl tr///, aka, transliterate. I hope it helps someone!

package your.package.name;

/**
 * Utility class that replaces chars of a String, aka, transliterate.
 * 
 * It's equivalent to bash 'tr' and perl 'tr///'.
 *
 */
public class ReplaceChars {

    public static String replace(String string, String from, String to) {
        return new String(replace(string.toCharArray(), from.toCharArray(), to.toCharArray()));
    }

    public static char[] replace(char[] chars, char[] from, char[] to) {

        char[] output = chars.clone();
        for (int i = 0; i < output.length; i++) {
            for (int j = 0; j < from.length; j++) {
                if (output[i] == from[j]) {
                    output[i] = to[j];
                    break;
                }
            }
        }
        return output;
    }

    /**
     * For tests!
     */
    public static void main(String[] args) {

        // Example from: https://en.wikipedia.org/wiki/Caesar_cipher
        String string = "THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG";
        String from = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        String to = "XYZABCDEFGHIJKLMNOPQRSTUVW";

        System.out.println();
        System.out.println("Cesar cypher: " + string);
        System.out.println("Result:       " + ReplaceChars.replace(string, from, to));
    }
}

This is the output:

Cesar cypher: THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG
Result:       QEB NRFZH YOLTK CLU GRJMP LSBO QEB IXWV ALD

Upvotes: 0

Peter Lawrey
Peter Lawrey

Reputation: 533492

You could create a table of String[] which is Character.MAX_VALUE in length. (Including the mapping to lower case)

As the replacements got more complex, the time to perform them would remain the same.

private static final String[] REPLACEMENT = new String[Character.MAX_VALUE+1];
static {
    for(int i=Character.MIN_VALUE;i<=Character.MAX_VALUE;i++)
        REPLACEMENT[i] = Character.toString(Character.toLowerCase((char) i));
    // substitute
    REPLACEMENT['á'] =  "a";
    // remove
    REPLACEMENT['-'] =  "";
    // expand
    REPLACEMENT['æ'] = "ae";
}

public String convertWord(String word) {
    StringBuilder sb = new StringBuilder(word.length());
    for(int i=0;i<word.length();i++)
        sb.append(REPLACEMENT[word.charAt(i)]);
    return sb.toString();
} 

Upvotes: 20

Prince John Wesley
Prince John Wesley

Reputation: 63688

My implementation is based on look up table.

public static String convertWord(String str) {
    char[] words = str.toCharArray();
    char[] find = {'á','é','ú','ý','ð','ó','ö','æ','þ','-','.',
            '/'};
    String[] replace = {"a","e","u","y","d","o","o","ae","th"};
    StringBuilder out = new StringBuilder(str.length());
    for (int i = 0; i < words.length; i++) {
        boolean matchFailed = true;
        for(int w = 0; w < find.length; w++) {
            if(words[i] == find[w]) {
                if(w < replace.length) {
                    out.append(replace[w]);
                }
                matchFailed = false;
                break;
            }
        }
        if(matchFailed) out.append(words[i]);
    }
    return out.toString();
}

Upvotes: 5

bruno conde
bruno conde

Reputation: 48265

My first choice would be to use a StringBuilder because you need to remove some chars from the string.

Second choice would be to iterate throw the array of chars and add the treated char to another array of the inicial size of the string. Then you would need to copy the array to trim the possible unused positions.

After that, I would make some performance tests to see witch one is better.

Upvotes: 2

mikera
mikera

Reputation: 106351

My suggestion would be:

  • Convert the String to a char[] array
  • Run through the array, testing each character one by one (e.g. with a switch statement) and replacing it if needed
  • Convert the char[] array back to a String

I think this is probably the fastest performance you will get in pure Java.

EDIT: I notice you are doing some changes that change the length of the string. In this case, the same principle applies, however you need to keep two arrays and increment both a source index and a destination index separately. You might also need to resize the destination array if you run out of target space (i.e. reallocate a larger array and arraycopy the existing destination array into it)

Upvotes: 8

Dunaril
Dunaril

Reputation: 2795

What i see being inefficient is that you are gonna check again characters that have already been replaced, which is useless.

I would get the charArray of the String instance, iterate over it, and for each character spam a series of if-else like this:

char[] array = word.toCharArray();
for(int i=0; i<array.length; ++i){
    char currentChar = array[i];
    if(currentChar.equals('é'))
        array[i] = 'e';
    else if(currentChar.equals('ö'))
        array[i] = 'o';
    else if(//...
}

Upvotes: 0

bstack
bstack

Reputation: 2528

Any time we have problems like this we use regular expressions are they are by far the fastest way to deal with what you are trying to do.

Have you already tried regular expressions?

Upvotes: 0

Ovidiu Pacurar
Ovidiu Pacurar

Reputation: 8209

Use the function String.replaceAll. Nice article similar with what you want: link

Upvotes: 0

dcn
dcn

Reputation: 4469

I doubt, that you can speed up the 'character replacement' at all really. As for the case of regular expression replacement, you may compile the regexs beforehand

Upvotes: 0

Related Questions