Reputation:
For the following code, I get an average computation time of 50 ms. How can I optimize
filter(u -> myStrings.contains(u.getName())
to get faster computation time?
list size 3000000, averaged 100 times
LongSummaryStatistics{count=100, sum=5135, min=46, average=51,350000, max=147}
Remark: the class User2 contains other attributes, too (plus getters, setters).
code:
public class Temp2 {
static HashSet<String> myStrings = new HashSet<>();
static long test1(List<User2> user2s) {
long time1 = System.currentTimeMillis();
ArrayList<User2> collect = user2s.stream()
.filter(u -> myStrings.contains(u.getName()))
.collect(Collectors.toCollection(ArrayList::new));
long time2 = System.currentTimeMillis();
return time2 - time1;
}
static class User2 {
String name;
public User2(String name) {this.name = name;}
public String getName() {return name;}
public void setName(String name) {this.name = name;}
}
public static void main(String... args) {
for (int i = 0; i < 15; i++) {myStrings.add(getRandomString());}
int size = 3_000_000;
List<User2> user2s = new ArrayList<>();
for (int i = 0; i < size; i++) {
user2s.add(new User2(getRandomString()));
}
repeat("test:", user2s, Temp2::test1, 100);
}
private static void repeat(String name, List<User2> user2s, ToLongFunction<List<User2>> test, int iterations) {
System.out.println("list size " + user2s.size() + ", averaged " + iterations + " times");
System.out.println(
IntStream.range(0, iterations)
.mapToLong(i -> test.applyAsLong(user2s))
.summaryStatistics());
}
private static String getRandomString() {
SecureRandom random = new SecureRandom();
return (new BigInteger(130, random).toString(32)).substring(0,12);
}
}
Upvotes: 0
Views: 325
Reputation: 816
As other fellow stackoverflower's pointed - the bottleneck is the hashCode
method.
Particularly for a String
- to compute a hashCode
you would need to traverse the string. So the longer String
is - the worse it gets, assuming you have a lot of unique strings. Luckily in your example they are quite short - only 12 chars.
Judging by a JMH microbenchmark on my machine (Core-i3 6100U 2.3 GHz) here is the timings per single operation for different implementations of contains
:
HashSet<String>
HashSet<Integer>
BitSet
In your case it means, that for a 3000000 list it will take approximately 174ms, 148ms and 105ms respectively on my machine. So you could gain up to ~1.5x speed up by changing a comparison approach.
So you if you make an additional fields in your User2
, that will hold a transformed representaion of name
which could suit something different than HashSet<String>
- you could benefit from there. Taking in consideration, that you have only 15 myStrings
- a BitSet
could be quite suitable there.
Apart from that, parallelStream
really doubles the performance - so if you have cores to spend - that would be the best option.
P.S. the benchmark code:
import org.openjdk.jmh.annotations.*;
import java.math.BigInteger;
import java.security.SecureRandom;
import java.util.*;
import java.util.concurrent.TimeUnit;
@State(Scope.Benchmark)
public class ProperBench {
private HashSet<String> myStrings;
private String randomName;
private HashSet<Integer> myInts;
private Integer randomInt;
private BitSet myBits;
private int randomBit;
List<User2> user2s = new ArrayList<>();
private static final int MY_STRINGS_SIZE = 15;
private class User2 {
String name;
public User2(String name) {this.name = name;}
public String getName() {return name;}
public void setName(String name) {this.name = name;}
}
private String getRandomString() {
SecureRandom random = new SecureRandom();
return (new BigInteger(130, random).toString(32)).substring(0,12);
}
@Setup(Level.Invocation)
public void setUpIteration() {
myStrings = new HashSet<>();
myInts = new HashSet<>();
myBits = new BitSet(MY_STRINGS_SIZE);
SecureRandom random = new SecureRandom();
for (int i = 0; i < MY_STRINGS_SIZE; i++) {
String aString = getRandomString();
myStrings.add(aString);
myInts.add(aString.hashCode());
if (random.nextBoolean()) myBits.set(i);
}
randomName = getRandomString();
randomInt = randomName.hashCode();
randomBit = random.nextInt(MY_STRINGS_SIZE);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public boolean test1() {
return myStrings.contains(randomName);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public boolean test2() {
return myInts.contains(randomInt);
}
@Benchmark
@BenchmarkMode(Mode.SampleTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public boolean test3() {
return myBits.get(randomBit);
}
}
Upvotes: 2