user3639557
user3639557

Reputation: 5281

best way to remove the first word in a string in Java

What is the fastest way of getting rid off the first token in a string? So far, I've tried this:

String parentStringValue = this.stringValue.split(" ", 2)[1];

and it's extremely memory and speed inefficient (when repeated millions of times for 15 word long strings). Suppose the string is made of tokens separated by spaces.

Upvotes: 1

Views: 9592

Answers (7)

Rudi Kershaw
Rudi Kershaw

Reputation: 12972

StringBuilder vs substring( x ) vs split( x ) vs Regex

Answer Edited : Major Flaws Corrected

After correcting for some fairly major flaws in my benchmarking (as pointed out by Jay Askren in the comments). The StringBuilder method came out as the fastest by a significant margin (although this assumes that the StringBuilder objects were pre-created), with substring coming out in second place. split() came out second to last at 10x slower than the StringBuilder method.

  ArrayList<String> strings = new ArrayList<String>();
  ArrayList<StringBuilder> stringBuilders = new ArrayList<StringBuilder>();
  for(int i = 0; i < 1000; i++) strings.add("Remove the word remove from String "+i);
  for(int i = 0; i < 1000; i++) stringBuilders.add(new StringBuilder(i+" Remove the word remove from String "+i));
  Pattern pattern = Pattern.compile("\\w+\\s");

  // StringBuilder method
  before = System.currentTimeMillis();
  for(int i = 0; i < 5000; i++){
      for(StringBuilder s : stringBuilders){
          s.delete(0, s.indexOf(" ") + 1);
      }
  }
  after = System.currentTimeMillis() - before;
  System.out.println("StringBuilder Method Took "+after);

  // Substring method
  before = System.currentTimeMillis();
  for(int i = 0; i < 5000; i++){
      for(String s : strings){
          String newvalue = s.substring(s.indexOf(" ") + 1);
      }
  }
  after = System.currentTimeMillis() - before;
  System.out.println("Substring Method Took "+after); 

  //Split method
  before = System.currentTimeMillis();
  for(int i = 0; i < 5000; i++){
      for(String s : strings){
          String newvalue = s.split(" ", 2)[1];
          System.out.print("");
      }
  }
  after = System.currentTimeMillis() - before;
  System.out.println("Your Method Took "+after);

  // Regex method
  before = System.currentTimeMillis();
  for(int i = 0; i < 5000; i++){
      for(String s : strings){
          String newvalue = pattern.matcher(s).replaceFirst("");
      }
  }
  after = System.currentTimeMillis() - before;
  System.out.println("Regex Method Took "+after);

I ran the above in random orders, after a warm up, in succession taking averages, increased the number of operations from 5 million to 30 million, and ran each ten times before moving on to the next. Either way the order of fastest to slowest stayed the same. Below is some sample output from the code above;

StringBuilder Method Took 203
Substring Method Took 588
Split Method Took 1833
Regex Method Took 2517

It is worth mentioning that calling split() with a String with a length greater than 1 simply uses Regex in its implementation and so there should be no difference between using split() and a Pattern object.

Upvotes: 6

Jay Askren
Jay Askren

Reputation: 10444

Rudi's benchmark had quite a few problems including unfairly and incorrectly favoring the split method. So I took his benchmark and improved upon it. If by chance you have a bunch of StringBuilders, the StringBuilder approach is slightly faster but if you need to convert them from strings first, it is quite slow. The substring approach is the next fastest and the one you should use if you have strings and not string builders. CommonsLang is the next fastest, and both the substring method and CommonsLang method are 4 to 5 times faster than using split. String.replaceFirst() uses regular expressions and is very slow because it needs to compile the regular expression every time it runs which doubles the time to run. Even without the compile step, it is significantly slower than the others.

Below is the code for the benchmark. You will need to add ApacheCommonsLang to your classpath to run this.

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Pattern;

import org.apache.commons.lang3.StringUtils;

/**
 *
 */
public class StringTest {
    public static void main(String[] args) {
        int numIterations = 100000;
        int numRuns = 10;
        ArrayList<String> strings = new ArrayList<String>();
          for(int i = 0; i < 1000; i++) strings.add("Remove the word remove from String "+i);
          //Your method
          long before = 0;
          long after = 0;
          for(int j=0; j < numRuns; j++) {
              before = System.currentTimeMillis();
              for(int i = 0; i < numIterations; i++){
                  for(String s : strings){
                      String newvalue = s.split(" ", 2)[1];
    //                System.out.println("split " + newvalue);
                  }
              }
              after = System.currentTimeMillis() - before;
              System.out.println("Split Took "+after + " ms");
          }


          // Substring method
          for(int j=0; j < numRuns; j++) {
              before = System.currentTimeMillis();
              for(int i = 0; i < numIterations; i++){
                  for(String s : strings){
                      String newvalue = s.substring(s.indexOf(" ") + 1);
                  }
              }
              after = System.currentTimeMillis() - before;
              System.out.println("Substring Took "+after + " ms");
          }



          // Apache Commons Lang method
          before = System.currentTimeMillis();
          for(int j=0; j < numRuns; j++) {
              before = System.currentTimeMillis();
              for(int i = 0; i < numIterations; i++){
                  for (String s : strings) {
                      String parentStringValue = StringUtils.substringAfter(s, " ");
                  }
              }
              after = System.currentTimeMillis() - before;
              System.out.println("CommonsLang Took "+after + " ms");
          }


          for(int j=0; j < numRuns; j++) {
              long deleteTime = 0l;     
              before = System.currentTimeMillis();
              for(int i = 0; i < numIterations; i++){

                  List<StringBuilder> stringBuilders = new ArrayList<StringBuilder>();
                  for (String s : strings) {
                      stringBuilders.add(new StringBuilder(s));
                  }
                  long beforeDelete = System.currentTimeMillis();
                  for (StringBuilder s : stringBuilders) {
                      s.delete(0, s.indexOf(" ") + 1);
                  }
                  deleteTime+=(System.currentTimeMillis() - beforeDelete);
              }
              after = System.currentTimeMillis() - before;
              System.out.println("StringBuilder Delete " + deleteTime + " ms out of " + after + " total ms");
          }

          // Faster Regex method
          Pattern pattern = Pattern.compile("\\w+\\s");
          for(int j=0; j < numRuns; j++) {
              before = System.currentTimeMillis();
              for(int i = 0; i < numIterations; i++){
                  for (String s : strings) {
                      String newvalue = pattern.matcher(s).replaceFirst("");
                  }
              }
              after = System.currentTimeMillis() - before;
              System.out.println("Faster Regex Took "+after + " ms");
          }

          // Slow Regex method
          for(int j=0; j < numRuns; j++) {
              before = System.currentTimeMillis();
              for(int i = 0; i < numIterations; i++){
                  for (String s : strings) {
                      String newvalue = s.replaceFirst("\\w+\\s", "");
                  }
              }
              after = System.currentTimeMillis() - before;
              System.out.println("Slow Regex Took " + after + " ms");
          }

    }
}

On my machine with an I7 processor I got the following results:

Split Took 10552 ms
Split Took 10298 ms
Split Took 10297 ms
Split Took 10292 ms
Split Took 10527 ms
Split Took 10356 ms
Split Took 10324 ms
Split Took 10283 ms
Split Took 10375 ms
Split Took 10346 ms
Substring Took 2385 ms
Substring Took 2354 ms
Substring Took 2363 ms
Substring Took 2358 ms
Substring Took 2361 ms
Substring Took 2367 ms
Substring Took 2370 ms
Substring Took 2350 ms
Substring Took 2354 ms
Substring Took 2397 ms
CommonsLang Took 2462 ms
CommonsLang Took 2461 ms
CommonsLang Took 2422 ms
CommonsLang Took 2426 ms
CommonsLang Took 2479 ms
CommonsLang Took 2441 ms
CommonsLang Took 2440 ms
CommonsLang Took 2420 ms
CommonsLang Took 2418 ms
CommonsLang Took 2421 ms
StringBuilder Delete 2302 ms out of 5904 total ms
StringBuilder Delete 2272 ms out of 5908 total ms
StringBuilder Delete 2241 ms out of 5879 total ms
StringBuilder Delete 2263 ms out of 5856 total ms
StringBuilder Delete 2285 ms out of 5858 total ms
StringBuilder Delete 2305 ms out of 5864 total ms
StringBuilder Delete 2287 ms out of 5854 total ms
StringBuilder Delete 2238 ms out of 5890 total ms
StringBuilder Delete 2335 ms out of 5875 total ms
StringBuilder Delete 2301 ms out of 5863 total ms
Faster Regex Took 18387 ms
Faster Regex Took 18331 ms
Faster Regex Took 18421 ms
Faster Regex Took 18356 ms
Faster Regex Took 18297 ms
Faster Regex Took 18416 ms
Faster Regex Took 18338 ms
Faster Regex Took 18467 ms
Faster Regex Took 18326 ms
Faster Regex Took 18355 ms
Slow Regex Took 35748 ms
Slow Regex Took 35855 ms
Slow Regex Took 35924 ms
Slow Regex Took 35761 ms
Slow Regex Took 35764 ms
Slow Regex Took 35698 ms
Slow Regex Took 35646 ms
Slow Regex Took 35637 ms
Slow Regex Took 35871 ms
Slow Regex Took 35781 ms

Upvotes: 1

HaRLoFei
HaRLoFei

Reputation: 316

Try to use StringBuffer or StringBuilder when doing string operation so that it won't leaving behind a lot of new unused objects and cause memory inefficient since repeated millions of times as you mentioned

Upvotes: 1

Tom Mac
Tom Mac

Reputation: 9853

If you are not adverse to using Apache Commons then you could use the StringUtils class.

Would mean that you don't have to cater for String.indexOf returning -1:

String parentStringValue = StringUtils.substringAfter(yourString, " ");

Upvotes: 0

Jens
Jens

Reputation: 69440

Try this:

  String s = "This is a test";

  System.out.println(s.replaceFirst("\\w+\\s", ""));

Upvotes: 1

Ruchira Gayan Ranaweera
Ruchira Gayan Ranaweera

Reputation: 35557

No need to split and create array, Just use substring

String str="I want to remove I";
String parentStringValue = str.substring(str.indexOf(" ")+1);
System.out.println(parentStringValue);

Out put:

want to remove I

Upvotes: 5

Mena
Mena

Reputation: 48404

You can use a combination of String.substring and String.indexOf for this.

Something along the lines of:

// TODO check indexOf does not return -1
this.stringValue.substring(this.stringValue.indexOf(" ") + 1)

Upvotes: 3

Related Questions