What is the complexity of these two string based algorithms?

I have written these two algorithms to check a string for duplicate characters (ABBC, AAAC). The first uses the hashset data structure, whilst the second relies purely on iteration.

Algorithm 1

String s = "abcdefghijklmnopqrstuvwxxyz";

public boolean isUnique(String s) {

        Set<Character> charSet = new HashSet<Character>();

        for(int i=0; i<s.length(); i++) {
            if(charSet.contains(s.charAt(i))) {
                return false;   
            } 
            charSet.add(s.charAt(i));
        }   
        return true;
}

Algorithm 2

String s = "abcdefghijklmnopqrstuvwxxyz";

public boolean isUnique2(String s) {

        for(int i=0; i<s.length()-1; i++) {
            for(int j = i+1; j<s.length(); j++) {
                if(s.charAt(i) == s.charAt(j)) {
                    return false;
                }
            }
        }
        return true;
}

My thoughts are that the first algorithm is O(N), and the second algorithm is O(N^2). When i run execution time tests on my (possibly unreliable) laptop, the average speed for the first algorithm is 2020ns, whilst the second algorithm is 995ns. This goes against my calculation of the algorithms complexity, could anybody advise me?

Upvotes: 0

Answers (4)

vsnyc

Reputation: 2257

There is a problem with your test data, for example, if you limit yourself to English language characters (a-z), you are guaranteed to have a duplicate if string length > 26. In the specific example you provided the string "abcdefghijklmnopqrstuvwxxyz" is sorted and the duplicate element x is found towards the end. As such iterated array lookup is faster because there is overhead in building the HashSet as you continue to parse the String.

A better test would be to test this with randomly generated integer sequences of large sizes and with a large max value, e.g. Long.MAX_VALUE

Below is a test that disproves your assertion that array search is faster. Run it a few times and see for yourself. Or you could take averages from 1000 runs, etc:

public class FindDuplicatesTest {

  public static final String s = generateRandomString(100000);

  private static String generateRandomString(int numChars) {
    Random random = new Random();
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < numChars; i++) {
      int codePoint = random.nextInt(65536);
      sb.append(Character.toChars(codePoint));
    }
    return sb.toString();
  }

  public boolean isUnique(String s) {

    Set<Character> charSet = new HashSet<Character>();

    for (int i = 0; i < s.length(); i++) {
      if (charSet.contains(s.charAt(i))) {
        return false;
      }
      charSet.add(s.charAt(i));
    }
    return true;
  }

  public boolean isUnique2(String s) {

    for (int i = 0; i < s.length() - 1; i++) {
      for (int j = i + 1; j < s.length(); j++) {
        if (s.charAt(i) == s.charAt(j)) {
          return false;
        }
      }
    }
    return true;
  }

  public static void main(String[] args) {
    FindDuplicatesTest app = new FindDuplicatesTest();

    long start = System.nanoTime();
    boolean result = app.isUnique(s);
    long stop = System.nanoTime();
    System.out.println(result);

    System.out.println("HashSet Search Time: " + (stop - start));

    start = System.nanoTime();
    result = app.isUnique2(s);
    stop = System.nanoTime();
    System.out.println(result);

    System.out.println("Array Search Time: " + (stop - start));

  }
}

Upvotes: 0

kiruwka

Reputation: 9450

Micro benchmarking that you are doing can give very misleading info about algorithm complexities.

It's easy to "port" your algorithms to check for duplicates in, say, array of Integers.

Then I recommend testing performance on say, array of 10^7 elements and you will definitely see the difference.

This way you'd be able to confirm your initially correct estimation O(N) for hashset vs O(N^2) for the second "loop" version.

Upvotes: 1

yurib

Reputation: 8147

when using O() notation you ignore constants, which means that O(n) == (10^10*n). so while O(n^2)>O(n) is true asymptotically, its not necessarily true for smaller values of n. in your case imagine that maybe resizing the array behind the hashset could be more time consuming than iterating the input.

Upvotes: 2

nullPointer

Reputation: 133

Assuming that the charAt method runs in O(1) time, the first algorithm is O(N) and the second is O(N^2). A linear time algorithm is not supposed to be faster than a quadratic algorithm for all inputs. It will be faster than the quadratic one after a certain N (which could possibly be in the millions).

for example:

void funcA(int n){
    for (int i = 0; i < n; i++){
        for (int j = 0; j < 10000; j++){
            int k = i + j;
        }
    }
}


void funcB(int n){
    for (int i = 0; i < n; i++){
        for (int j = 0; j < n; j++){
            int k = i + j;
        }
    }
}

even though funcA is linear and funcB is quadratic, it is easy to see that funcB will be faster than funcA for n < 10000. In your case the hashSet requires time to compute the hash and so may be slower for inputs of a certain size.

Upvotes: 1

What is the complexity of these two string based algorithms?

Answers (4)

Related Questions