damon
damon

Reputation: 8477

Removing duplicates from a sorted int[] using binarysearch

I have a rather large int[] which is sorted using Arrays.sort() .. I need to remove the duplicate elements from the array.

This question originates from sedgewick's Algorithms book 1.1.28

1.1.28 Remove duplicates. Modify the test client in BinarySearch to remove any du- plicate keys in the whitelist after the sort.

I tried to create a noDupes() method which takes in an int[] and returns an int[] with duplicates removed

The rank() methods are from sedgewick's code.which does the binary search

public static int[] noDupes(int[] a){
    Arrays.sort(a);
    int maxval= a[a.length-1];
    int[] nodupes = new int[maxval];
    int i=0;
    for(int j=0;j<a.length;j++){
        int rnk = rank(a[j],nodupes);
        System.out.println(a[j]+" rank="+rnk);
        if (rnk < 0){
            System.out.println(a[j]+" is not dupe");
            nodupes[i] = a[j];
            i++;
        }
    }

    return nodupes;
}
public static int rank(int key,int[] a){
    return rank(key,a,0,a.length-1);
}

public static int rank(int key,int[] a,int lo,int hi){
    if(lo > hi) return -1;
    int mid = lo+(hi-lo)/2;

    if(key < a[mid])return rank(key,a,0,mid-1);
    else if(key > a[mid])return rank(key,a,mid+1,hi);
    else return mid;
}

When I ran this with a sample array

int[] a =new int[]{2,2,2,3,4,4,5,6};
int[] ret = noDupes(a);

I am getting some unexpected output..even after 2 is added to the nodupes array, the rank for an existing element is -1..

2 rank=-1
2 is not dupe
2 rank=-1
2 is not dupe
2 rank=-1
2 is not dupe
3 rank=-1
3 is not dupe
4 rank=-1
4 is not dupe
4 rank=4
5 rank=-1
5 is not dupe
6 rank=-1
6 is not dupe
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 6
    at ...noDupes(BinSearch.java:85)
    at ...main(BinSearch.java:96)

I couldn't figure out what I am doing wrong..can someone help?

Upvotes: 0

Views: 2881

Answers (5)

Darshan Mehta
Darshan Mehta

Reputation: 30819

This code will help you.

public Integer[] removeDuplicates(Integer[] input){
        Integer[] arrayWithoutDuplicates = null;
        Set<Integer> set = new LinkedHashSet<Integer>();
        for(int i : input){
            set.add(i);
        }
        arrayWithoutDuplicates = (Integer[]) set.toArray();
        return arrayWithoutDuplicates;
}

Upvotes: 0

sbhatt
sbhatt

Reputation: 485

If you have your array sorted and if you want to remove duplicates I think you don't need to use binary search for that.

when you sort an array, duplicate elements will be adjacent to each other.

E.g. Array = {9,8,9,1,2,5,2,5,1} After sorting Array = {1,1,2,2,5,5,8,9,9}

You can use following way to remove duplicates (inplace)

int a[] = {sorted array}

for(int i=0,target=0;i<a.length-1;i++) {
  if(a[i]!=a[i+1]) {
     a[target++] = a[i];
  }
}
a[target++] = a[a.length-1];
for(int i=target;i<a.length;i++) {
a[i] = 0; // fill in the values which you don't want.
}

will remove duplicates in one pass only

Upvotes: 2

ajay.patel
ajay.patel

Reputation: 1967

This should help:

int[] nodupes = new int[a.length];

nodupes array is going out of bound.

Note: I am not sure if logic that you are using is the best for the problem. But this should solve your exception.

Upvotes: 0

Evgeniy Dorofeev
Evgeniy Dorofeev

Reputation: 136042

I would do it this way

public static int[] noDupes(int[] a) {
    Arrays.sort(a);
    int noDupCount = 0;
    for (int i = 0; i < a.length; i++) {
        if (i == 0 || a[i] != a[i - 1]) {
            noDupCount++;
        }
    }
    int[] a2 = new int[noDupCount];
    for (int i = 0, j = 0; i < a.length; i++) {
        if (i == 0 || a[i] != a[i - 1]) {
            a2[j++] = a[i];
        }
    }
    return a2;
}

Upvotes: 3

shreyansh jogi
shreyansh jogi

Reputation: 2102

just add all the array values to the HashSet it will automatically remove duplicates and gives you unique values and then again convert it to array that you required

Upvotes: 3

Related Questions