TUNER88
TUNER88

Reputation: 923

Remove short words and characters from a string Java

Input string:

String input = "Lorem Ipsum is simply dummy text of the printing and typesetting industry";

Output string:

String output = "Lorem Ipsum simply dummy printing typesetting industry";

What is the best way to remove short words?

Here my first idea:

private String removeShortWords(String string){
    int minLength = 5;
    String result = "";

    String[] words = string.split("\\s+");

    for (int i = 0; i < words.length; i++){
        String word = words[i];
        if(word.length() >= minLength){
            result += word + " ";
        }
    }       

    return result;
}

Upvotes: 3

Views: 5204

Answers (5)

Thudani Hettimulla
Thudani Hettimulla

Reputation: 774

Try a StringTokenizer instead of Split and use a StringBuilder to create result

int minLength = 5;
StringTokenizer tokenizer = new StringTokenizer(input, " ");
StringBuilder builder = new StringBuilder();
  while(tokenizer.hasMoreTokens()){
    String token = tokenizer.nextToken();
    if(token.length() >= minLength){
    builder.append(token);
        builder.append(" ");
  }
}
return builder.toString();

Upvotes: 0

Dev
Dev

Reputation: 3580

try this code

          String input = "Lorem Ipsum is simply dummy text of the printing and typesetting industry";
            String[] dev=input.split(" ");
             for(int i=0;i<dev.length;i++)
              if(dev[i].length()<=2)
               input=input.replaceAll(dev[i], "");

Upvotes: 0

David Rabinowitz
David Rabinowitz

Reputation: 30448

You approach is ok, but for performance reasons it is better to use use StringBuilder, as the += creates it in every iteration of the loop. Notice also Maroun's comments regarding the integrity of the output.

Another option is to use regular expression, this call should have the same effect:

return string.replaceAll("\\b\\w{1,4}\\b","");

Notice that for performance reasons you would want to pre-comile the pattern and re-use it.

Upvotes: 1

Boann
Boann

Reputation: 50041

One line:

String output = input.replaceAll("\\b\\w{1,4}\\b\\s?", "");

Upvotes: 9

Maroun
Maroun

Reputation: 95998

Your approach is fine except that:

  • You don't preserve number of spaces when you rebuild the String.
  • You should use StringBuilder instead of +=.
  • You add redundant space at the end.

I would do something like that:

Iterate on the String, as long as I have a char, I increment a counter and add the char to some temp String, otherwise, I have a space. I check the value of the counter, if it's <= 5 I don't add the temp String, otherwise, I do. This way I preserve the spaces too.

Regarding the complexity, it's O(n) when n is the length of the String, as we "travel" on the String only once.

Upvotes: 1

Related Questions