D180
D180

Reputation: 742

java string.replace running too slow with methods

I have a method with a String as an argument that replaces specific placeholders with strings from methods:

public static String format(String in)
{
    in=in.replace("[time]",getTime()); //just some methods returning strings
    in=in.replace("[name]",getName());
}

my problem is that even if the string "[time]" doesn't occur in the string in, the methods getTime() and getName() are called every time format(String) is called. Since i call format(String) hundreds of times, it takes me up to an minute to parse all the strings. Since getTime() and getName() require much time to execute and the substrings "[time]" and "[name]" are rare, it would be much faster if the replace method only called getTime() or getName() when there is a placeholder. the fastest possible solution i see is to rewrite string.replace() to replace(String). Now my two questions are:

  1. is there a faster way to do this?
  2. which is the fastest way to write replace(String)?

Upvotes: 1

Views: 1569

Answers (3)

atk
atk

Reputation: 9314

Performance testing is king to determine what's fastest.

A little code to benchmark three different ways to write this...

import java.io.IOException;
import java.util.Properties;
import java.util.Vector;


public class BenchmarkStringStuff {


    public static void main(String[] args) {
        try {
            System.getProperties().store(System.out, "");
        } catch (IOException ioe ) {
            System.out.println("Failed to write properties to STDOUT");
            return;
        }


        BenchmarkStringStuff bss = new BenchmarkStringStuff();
        bss.benchmark(10);
        bss.benchmark(1000);
        bss.benchmark(100000);
        bss.benchmark(1000000);
    }

    public void benchmark(int numTests) 
    {
        Vector<Test> tests = new Vector<Test>();
        tests.add(new OriginalCode());
        tests.add(new TwoSearches());
        tests.add(new SearchOnceAndSubstring());


        Vector<String> testStrings = new Vector<String>();
        // we have a very poor sample, here.  You should test with a better 
        // representation of your expected values
        testStrings.add("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa[time]aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");

        Performance overallRuntime = new Performance("overall combined runtime");
        overallRuntime.setCount(numTests * testStrings.size() * tests.size());
        overallRuntime.start();
        for( Test test : tests )
        {
            Performance codePerformance = new Performance(test.getClass().getName());
            codePerformance.setCount(numTests * testStrings.size());
            codePerformance.start();
            for(String testString : testStrings)
            {
                //Performance stringPerformance = new Performance(testString);
                //stringPerformance.setCount(numTests);
                //stringPerformance.start();
                for(int i=0; i<numTests; i++)
                {
                    test.test(testString);
                }
                //stringPerformance.end();
                //System.out.println(stringPerformance.toString());
            }
            codePerformance.end();
            System.out.println(codePerformance.toString());
        }
        overallRuntime.end();
        System.out.println(overallRuntime.toString());
        System.out.println();

    }

    private static String getTime()
    { 
        return "a date"; 
        // static value to avoid any caching behavior - we
        // want to test the algorithm, not the external stuff.
        // you should use your real getTime method to see real numbers
        // for your application
    }

    private static String getName(){ return "a name"; }


    private interface Test {
        public String test(String in);
    }

    public class OriginalCode implements Test {

        public String test(String in)
        {
            in=in.replace("[time]",getTime()); //just some methods returning strings
            in=in.replace("[name]",getName());
            return in;
        }
    }

    public class TwoSearches implements Test {
        public String test(String in)
        {
            if (in.contains("[time]"))
            {
                in=in.replace("[time]",getTime());
            }
            return in;
        }
    }

    public class SearchOnceAndSubstring implements Test {
        public String test(String in)
        {
            String REPLACEME = "[time]";
            int idx = in.indexOf(REPLACEME);
            if( idx == 0 )
            {
                in=getTime() + in.substring(REPLACEME.length());
            }
            else if( idx > 0 )
            {
                in = in.substring(0,idx) + getTime();
                if( idx+REPLACEME.length() < in.length())
                {
                    in += in.substring(idx+REPLACEME.length());
                }
            }
            return in;
        }
    }





    private class Performance
    {
        long start = 0;
        long end = 0;
        String name = null;
        public int count = 0;

        public Performance(String name)
        {
            this.name = name;
        }

        public void start(){ start = System.currentTimeMillis(); }
        public void end() { end   = System.currentTimeMillis(); }
        public void setCount(int count){ this.count = count; } 

        /** be sure to call start & stop first **/
        public long total(){ return end - start; }


        public String toString()
        {
            return count + "cycles    start:"+start+"     end:"+end+"    total:"+total() + "   <---- " + name;
        }
    }
}

And the results of running this code (performance-unrelatedstuff redacted)...

#
#Thu Mar 14 18:45:37 EDT 2013
java.runtime.name=Java(TM) SE Runtime Environment
java.vm.version=20.5-b03
java.vm.vendor=Sun Microsystems Inc.
java.vendor.url=http\://java.sun.com/
java.vm.name=Java HotSpot(TM) Client VM
sun.java.launcher=SUN_STANDARD
sun.os.patch.level=Service Pack 1
java.vm.specification.name=Java Virtual Machine Specification
java.runtime.version=1.6.0_30-b12
os.arch=x86
java.vm.specification.vendor=Sun Microsystems Inc.
user.variant=
os.name=Windows 7
java.specification.name=Java Platform API Specification
java.class.version=50.0
sun.management.compiler=HotSpot Client Compiler
os.version=6.1
java.specification.version=1.6
java.class.path=REDACTED
java.vm.specification.version=1.0
sun.java.command=BenchmarkStringStuff
sun.arch.data.model=32
java.specification.vendor=Sun Microsystems Inc.
java.version=1.6.0_30
java.vendor=Sun Microsystems Inc.
sun.desktop=windows
sun.cpu.isalist=pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86
10cycles    start:1363301137531     end:1363301137531    total:0   <---- BenchmarkStringStuff$OriginalCode
10cycles    start:1363301137531     end:1363301137531    total:0   <---- BenchmarkStringStuff$TwoSearches
10cycles    start:1363301137531     end:1363301137531    total:0   <---- BenchmarkStringStuff$SearchOnceAndSubstring
30cycles    start:1363301137531     end:1363301137531    total:0   <---- overall combined runtime

1000cycles    start:1363301137531     end:1363301137546    total:15   <---- BenchmarkStringStuff$OriginalCode
1000cycles    start:1363301137546     end:1363301137546    total:0   <---- BenchmarkStringStuff$TwoSearches
1000cycles    start:1363301137546     end:1363301137562    total:16   <---- BenchmarkStringStuff$SearchOnceAndSubstring
3000cycles    start:1363301137531     end:1363301137562    total:31   <---- overall combined runtime

100000cycles    start:1363301137562     end:1363301138108    total:546   <---- BenchmarkStringStuff$OriginalCode
100000cycles    start:1363301138108     end:1363301138451    total:343   <---- BenchmarkStringStuff$TwoSearches
100000cycles    start:1363301138451     end:1363301138499    total:48   <---- BenchmarkStringStuff$SearchOnceAndSubstring
300000cycles    start:1363301137562     end:1363301138499    total:937   <---- overall combined runtime

1000000cycles    start:1363301138499     end:1363301143663    total:5164   <---- BenchmarkStringStuff$OriginalCode
1000000cycles    start:1363301143663     end:1363301146784    total:3121   <---- BenchmarkStringStuff$TwoSearches
1000000cycles    start:1363301146784     end:1363301147190    total:406   <---- BenchmarkStringStuff$SearchOnceAndSubstring
3000000cycles    start:1363301138499     end:1363301147190    total:8691   <---- overall combined runtime

The winner is SearchOnceAndSubstring, at a consistent order of magnitude faster than the others.

There may be other optimizations by playing directly with the char arrays, but I suspect that SearchOnceAndSubstring would be very close, with most of the time lost on the string concatenation. I'll leave performance testing of that as an exercise for the reader.

Upvotes: 0

BillRobertson42
BillRobertson42

Reputation: 12883

Logically, replace must search for the string first before replacing it, so in cases where the string does not occur it's equivalent to calling contains, because contains simply does a search. So searching first would not speed it up.

It sounds like the code might be doing many repeated searches over the same source strings. If so, then the answer may be to stop doing it so much. You might want to scan everything first, store the replaced results and then retrieve those when necessary.

Upvotes: 1

rgettman
rgettman

Reputation: 178263

If searching in the string takes less time than calling getTime() and getName(), then use contains first.

if (in.contains("[time]"))
{
    in=in.replace("[time]",getTime());
}

Then you only call getTime() if it's needed. Same for getName().

Alternately, if getTime() and getName() always return the same result, then call them only once and store their results for later use.

Upvotes: 1

Related Questions