mcacorner
mcacorner

Reputation: 1274

Java : Searching Ids from hashset or String

I have large number of IDs which can I store in HashSet or String i.e.

String strIds=",1,2,3,4,5,6,7,8,.,.,.,.,.,.,.,1000,";
    Or
HashSet<String> setOfids = new HashSet<String>();
setOfids.put("1");
setOfids.put("2");
.
.
.
setOfids.put("1000");

Further more I want to perform search on IDs

Which Should I use for better Performance(Faster & memory efficient)

1) strIds.indexOf("someId");
    or
2) setOfids.contains("someId");

Tell me any other way so, I can do the same. Thanks for Looking here :)

Upvotes: 0

Views: 510

Answers (5)

Ambrish
Ambrish

Reputation: 3677

Set will be better choice. Reasons:

  • Search will be O(1) in case of Set. In case of String it will be O(N).
  • Performance will not degrade as data grows.
  • String will use more memory if you want to do any kind of data manipulation (add or remove IDs).
  • indexOf might give you negative result as well

    Say 1000 is present but 100 is not, so indexOf will return the location of 1000 as 100 is substring of 1000.

Simple POC code for the performance:

import java.util.HashSet;
import java.util.Set;

public class TimeComputationTest {

  public static void main(String[] args) {
    String strIds = null;
    Set<String> setOfids = new HashSet<String>();
    StringBuffer sb = new StringBuffer();

    for (int i = 1;i <= 1000;i++) {
      setOfids.add(String.valueOf(i));
      if (sb.length() != 0) {
        sb.append(",");
      }
      sb.append(i);
    }
    strIds = sb.toString();

    testTime(strIds, setOfids, "1");
    testTime(strIds, setOfids, "100");
    testTime(strIds, setOfids, "500");
    testTime(strIds, setOfids, "1000");
  }

  private static void testTime(String strIds, Set<String> setOfids, String string) {
    long startTime = System.nanoTime();
    strIds.indexOf(string);
    long endTime = System.nanoTime();

    System.out.println("String search time for (" + string + ") is " + (endTime - startTime));

    startTime = System.nanoTime();
    setOfids.contains(string);
    endTime = System.nanoTime();

    System.out.println("HashSet search time for (" + string + ") is " + (endTime - startTime));
  }
}

The output will be (approx.):

String search time for (1) is 3000
HashSet search time for (1) is 7000
String search time for (100) is 6000
HashSet search time for (100) is 2000
String search time for (500) is 33000
HashSet search time for (500) is 2000
String search time for (1000) is 71000
HashSet search time for (1000) is 1000

Upvotes: 2

HJK
HJK

Reputation: 1382

I assume HashSet is the better option to go with. There are two advantages:

  1. It doesn't allow duplicates
  2. HashSet internally assumes a HashMap, hence retrieval is faster.

Upvotes: 1

Krunal Patel
Krunal Patel

Reputation: 186

It will work faster:::

String strIds=",1,2,3,4,5,6,7,8,.,.,.,.,.,.,.,1000,";
String searchStr = "9";
boolean searchFound = strIds.contains(","+searchStr +",");

Upvotes: 0

Erwin
Erwin

Reputation: 3366

Besides the performances, you shouldn't use a String like that. Although it is creative, it is not made for indexing like that. What would happen if you want to change the format of the ids?

To improve the performance and save memory of hashSet you could of course use

HashSet<Integer> instead of HashSet<String>

Upvotes: 2

laune
laune

Reputation: 31300

A hash table lookup is "constant time", i.e., it does not grow with the number of ids.

But a compact string of all id's in a String requires the least memory.

So, make up your mind: fastest retrieval or a minimum of storage!

Upvotes: 3

Related Questions