Reputation: 11

Fastest way of finding number of distinct elements in an array

I have a square matrix containing integers (not necessarily distinct). I need the fastest way to find the number of distinct elements in it. I tried to store the integers in a 1D array, sorted it and then found the number of distinct elements...but apparently, it is not fast enough. Could you suggest a better and faster procedure in C language?

Upvotes: 1

Answers (5)

Bhoot

Reputation: 2633

I would suggest the following approach:

Create a hashmap over the values in the matrix.
Return size of hashmap as result.

The time complexity of this problem would be of the order of time required to create the hashmap. This does not require any sorting and would be more efficient than the approach which you are using. This approach is independent of the range of input data which makes it more general.

(I am not good at implementing stuff in C) I will include a Java code which demonstrates the approach.

class Distinct {
     public static void main(String ar[]) {
          int size;
          int matrix[][] = new int[size][size]; 
          // POPULATE THE MATRIX BY IMPLEMENTING CUSTOM METHOD
          populate(matrix); 
          // ALGORITHM:
          HashMap<Integer,Boolean> distinct = new HashMap<Integer,Boolean>();
          for(int i=0;i<size;i++) {
              for(int j=0;j<size;j++) {
                  distinct.put(matrix[i][j],true);
              }
          }
          System.out.println("Number of distinct elements:"+distinct.size());
     }
}

Pointers on implementing a hashmap in C can be found here: Implementing a HashMap

I hope this helps!

Upvotes: 0

Ivan Ivanov

Reputation: 2106

First, it depends on way You treat your array. If it dynamic or not, you can use 2d array as 1d array, because static 2d array IS 1d array, and dynamic array can be created as 1d array.

const int M = 100;
const int N = 200;
int **a = NULL;
int i, j;

a = (int**) malloc(M * sizeof(int*) + N * M * sizeof(int));
a[0] = (int*)(a + M);
for (i = 1; i < M; i++) {
    a[i] = a[0] + i * N;
}
//code
free(a);

and

a[i][j] === a[0][i*num_of_columns + j]

so, 2 algorithms for 1d arrays

typedef int T;
#define EQ(a, b) ((a)==(b))

void quadDiff(T *a, size_t *out_size) {
    size_t i, j;
    size_t size = *out_size;
    size_t pos = 0;
    int unique;

    for (i = 0; i < size; i++) {
        unique = 1;
            for (j = i; j > 0; j--) {
                if (EQ(a[i], a[j-1])) {
                    unique = 0;
                    break;
                }
            }
            if (unique) {
                a[pos++] = a[i];
        }
    }
    *out_size = pos;
}

and

void sortDiff(T *a, size_t item_size, size_t *out_size, int (*cmp)(const void *, const void *)) {
    size_t i;
    T prev = a[0];
    size_t pos = 0;
    qsort(a, *out_size, item_size, cmp);
    for (i = 0; i < *out_size; i++) {
        if (EQ(prev, a[i])) {
            continue;
        }
        prev = a[i];
        a[pos++] = a[i];
    }
    *out_size = pos;
}

Upvotes: 0

bph

Reputation: 11268

bounded set of integer values 0-99

matrix size 300 x 300

int array[100];
int i;
int j;
int n_unique = 0;

for (i=0;i<300;i++) {
    if (n_unique == 100) break;
    for  (j=0;j<300;j++) {
        if (array[mat[i][j]] == 0) {
            array[mat[i][j]] = 1;
            n_unique++;
            if (n_unique == 100) break;
         }
    }
}

algorithm is O(n)

Upvotes: 1

drew_w

Reputation: 10430

Usually there is a compromise between speed, memory, and complexity for any algorithm. As others have said, the more information you know about your data, the faster you can make your algorithm. Say you had numbers between 1 and 100 (as an example), you would be able to really optimize the algorithm with this information.

I took the time to write up on example algorithm that is generic for any data set. This assumes that your set size is sufficiently small or that you have enough memory available. Basically the short version is to allocate an array with as many elements as the original two dimensional array. Then you loop over the original array and drop unique elements into boxes in the new array. Finally count the number of elements in the new array:

#include <stdio.h>      /* printf, scanf, puts, NULL */
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
typedef int bool;
#define TRUE 1
#define FALSE 0

/* The actual algorithm function - finds the number of unique values */
int NumberUniqueValues(int **array, int width, int height)
{
  int i = 0, j = 0, k = 0, maxFilled = 0;
  bool wasFound = FALSE;
  int *newElements = malloc(sizeof(int) * width * height);

  for (i = 0; i < height; i++) {
    for (j = 0; j < width; j++) {
      wasFound = FALSE;
      for (k = 0; k < maxFilled; k++) {
        if (newElements[k] == array[i][j]) {
          wasFound = TRUE;
          break;
        }
      }

      if (!wasFound) newElements[maxFilled++] = array[i][j];
    }
  }

  /* Free space */
  free(newElements);
  return maxFilled;
}

int main ()
{
  /* variables */
  int i = 0, j = 0;
  int originalWidth = 10;
  int originalHeight = 10;

  /* initialize array */
  int **originalArray = (int **)malloc(originalHeight * sizeof(int*));
  for (i = 0; i < originalHeight; i++) {
    originalArray[i] = (int *)malloc(originalWidth * sizeof(int));
  }

  /* initialize random seed, then fill with random values */
  srand (time(NULL));
  for (i = 0; i < originalHeight; i++) {
    for (j = 0; j < originalWidth; j++) {
      originalArray[i][j] = rand() % 100;
    }
  }

  printf("Number unique values: %d\n", NumberUniqueValues(originalArray, originalWidth, originalHeight));

  /* Free space */
  for (i = 0; i < originalHeight; i++) free(originalArray[i]);
  free(originalArray);

  return 0;
}

Again, this might not be the fastest algorithm for your case since I don't know all the details but it will at least work. Best of luck!

Upvotes: 0

Andrey Mishchenko

Reputation: 4216

What will be fastest is very dependent on the data you are dealing with, the sizes of the structures involved, etc.

Do you have bounds on the values the integers can take? If so, then keeping an array indexed by integer value, initialized to zeros, which tracks how many copies of that value are in the matrix, will probably be fastest and space-usage-reasonable.

If not, then possibly using a hash table to do something similar will be fastest.

But in any case, having more precise parameters for the problem would be very helpful.

Upvotes: 2

Fastest way of finding number of distinct elements in an array

Answers (5)

Related Questions