user7
user7

Reputation: 2389

count the number of distinct absolute values among the elements of the array

I was asked an interview question to find the number of distinct absolute values among the elements of the array. I came up with the following solution (in C++) but the interviewer was not happy with the code's run time efficiency.

  1. I will appreciate pointers as to how I can improve the run time efficiency of this code?
  2. Also how do I calculate the efficiency of the code below? The for loop executes A.size() times. However I am not sure about the efficiency of STL std::find (In the worse case it could be O(n) so that makes this code O(n²) ?

Code is:

int countAbsoluteDistinct ( const std::vector<int> &A ) {
  using namespace std;
  list<int> x;

  vector<int>::const_iterator it;
  for(it = A.begin();it < A.end();it++)
    if(find(x.begin(),x.end(),abs(*it)) == x.end())
      x.push_back(abs(*it));
  return x.size();
}

Upvotes: 9

Views: 12182

Answers (13)

Flame
Flame

Reputation: 2207

To propose alternative code to the set code.

Note that we don't want to alter the caller's vector, we take by value. It's better to let the compiler copy for us than make our own. If it's ok to destroy their value we can take by non-const reference.

#include <vector>
#include <algorithm>
#include <iterator>

#include <cstdlib>

using namespace std;

int count_distinct_abs(vector<int> v)
{
    transform(v.begin(), v.end(), v.begin(), abs); // O(n) where n = distance(v.end(), v.begin())
    sort(v.begin(), v.end()); // Average case O(n log n), worst case O(n^2) (usually implemented as quicksort.
    // To guarantee worst case O(n log n) replace with make_heap, then sort_heap.

    // Unique will take a sorted range, and move things around to get duplicated
    // items to the back and returns an iterator to the end of the unique section of the range
    auto unique_end = unique(v.begin(), v.end()); // Again n comparisons
    return distance(v.begin(), unique_end); // Constant time for random access iterators (like vector's)
}

The advantage here is that we only allocate/copy once if we decide to take by value, and the rest is all done in-place while still giving you an average complexity of O(n log n) on the size of v.

Upvotes: 19

malat
malat

Reputation: 12502

Since I was not happy with the previous answer here is mine today. Your intial question does not mention how big your vector is. Suppose your std::vector<> is extremely large and have very few duplicates (why not?). This means that using another container (eg. std::set<>) will basically duplicate your memory consumption. Why would you do that since your goal is simply to count non duplicate.

I like @Flame answer, but I was not really happy with the call to std::unique. You've spent lots of time carefully sorting your vector and then simply discard the sorted array while you could be re-using it afterward.

I could not find anything really elegant in the STD library, so here is my proposal (a mixture of std::transform + std::abs + std::sort, but without touching the sorted array afterward).

// count the number of distinct absolute values among the elements of the sorted container
template<class ForwardIt>
typename std::iterator_traits<ForwardIt>::difference_type 
count_unique(ForwardIt first, ForwardIt last)
{
  if (first == last)
    return 0;

  typename std::iterator_traits<ForwardIt>::difference_type 
    count = 1;
  ForwardIt previous = first;
  while (++first != last) {
    if (!(*previous == *first) ) ++count;
    ++previous;
  }
  return count;
}

Bonus point is works with forward iterator:

#include <iostream>
#include <list>
int main()
{
  std::list<int> nums {1, 3, 3, 3, 5, 5, 7,8};
  std::cout << count_unique( std::begin(nums), std::end(nums) ) << std::endl;

  const int array[] = { 0,0,0,1,2,3,3,3,4,4,4,4};
  const int n = sizeof array / sizeof * array;
  std::cout << count_unique( array, array + n ) << std::endl;
  return 0;
}

Upvotes: 2

V Malhi
V Malhi

Reputation: 17

You have nested loops in your code. If you will scan each element over the whole array it will give you O(n^2) time complexity which is not acceptable in most of the scenarios. That was the reason the Merge Sort and Quick sort algorithms came up to save processing cycles and machine efforts. I will suggest you to go through the suggested links and redesign your program.

Upvotes: 0

Yourpalal
Yourpalal

Reputation: 496

Basically, replace your std::list with a std::set. This gives you O(log(set.size())) searches + O(1) insertions, if you do things properly. Also, for efficiency, it makes sense to cache the result of abs(*it), although this will have only a minimal (negligible) effect. The efficiency of this method is about as good as you can get it, without using a really nice hash (std::set uses bin-trees) or more information about the values in the vector.

Upvotes: 2

Ajeet Ganga
Ajeet Ganga

Reputation: 8653

One more approach :

Space efficient : Use hash map . O(logN)*O(n) for insert and just keep the count of number of elements successfully inserted.

Time efficient : Use hash table O(n) for insert and just keep the count of number of elements successfully inserted.

Upvotes: 0

Ajeet Ganga
Ajeet Ganga

Reputation: 8653

The best way is to customize the quicksort algorithm such that when we are partitioning whenever we get two equal element then overwrite the second duplicate with last element in the range and then reduce the range. This will ensure you will not process duplicate elements twice. Also after quick sort is done the range of the element is answer Complexity is still O(n*Lg-n) BUT this should save atleast two passes over the array.

Also savings are proportional to % of duplicates. Imagine if they twist original questoin with, 'say 90% of the elements are duplicate' ...

Upvotes: 0

Michael Dorgan
Michael Dorgan

Reputation: 12515

Sort the list with a Radix style sort for O(n)ish efficiency. Compare adjacent values.

Upvotes: 0

Chris Mennie
Chris Mennie

Reputation: 674

As @Jerry said, to improve a little on the theme of most of the other answers, instead of using a std::map or std::set you could use a std::unordered_map or std::unordered_set (or the boost equivalent).

This would reduce the runtimes down from O(n lg n) or O(n).

Another possibility, depending on the range of the data given, you might be able to do a variant of a radix sort, though there's nothing in the question that immediately suggests this.

Upvotes: 0

Flame
Flame

Reputation: 2207

Two points.

  1. std::list is very bad for search. Each search is O(n).

  2. Use std::set. Insert is logarithmic, it removes duplicate and is sorted. Insert every value O(n log n) then use set::size to find how many values.

EDIT:

To answer part 2 of your question, the C++ standard mandates the worst case for operations on containers and algorithms.

Find: Since you are using the free function version of find which takes iterators, it cannot assume anything about the passed in sequence, it cannot assume that the range is sorted, so it must traverse every item until it finds a match, which is O(n).

If you are using set::find on the other hand, this member find can utilize the structure of the set, and it's performance is required to be O(log N) where N is the size of the set.

Upvotes: 1

karlphillip
karlphillip

Reputation: 93468

I think a std::map could also be interesting:

int absoluteDistinct(const vector<int> &A) 
{
    map<int, char> my_map;

    for (vector<int>::const_iterator it = A.begin(); it != A.end(); it++)
    {
        my_map[abs(*it)] = 0;
    }

    return my_map.size();
}

Upvotes: 0

Mark B
Mark B

Reputation: 96281

To answer your second question first, yes the code is O(n^2) because the complexity of find is O(n).

You have options to improve it. If the range of numbers is low you can just set up a large enough array and increment counts while iterating over the source data. If the range is larger but sparse, you can use a hash table of some sort to do the counting. Both of these options are linear complexity.

Otherwise, I would do one iteration to take the abs value of each item, then sort them, and then you can do the aggregation in a single additional pass. The complexity here is n log(n) for the sort. The other passes don't matter for complexity.

Upvotes: 0

Jerry Coffin
Jerry Coffin

Reputation: 490368

Yes, this will be O(N2) -- you'll end up with a linear search for each element.

A couple of reasonably obvious alternatives would be to use an std::set or std::unordered_set. If you don't have C++0x, you can replace std::unordered_set with tr1::unordered_set or boost::unordered_set.

Each insertion in an std::set is O(log N), so your overall complexity is O(N log N).

With unordered_set, each insertion has constant (expected) complexity, giving linear complexity overall.

Upvotes: 3

Chad
Chad

Reputation: 19052

std::find() is linear (O(n)). I'd use a sorted associative container to handle this, specifically std::set.

#include <vector>
#include <set>
using namespace std;

int distict_abs(const vector<int>& v)
{
   std::set<int> distinct_container;

   for(auto curr_int = v.begin(), end = v.end(); // no need to call v.end() multiple times
       curr_int != end;
       ++curr_int)
   {
       // std::set only allows single entries
       // since that is what we want, we don't care that this fails 
       // if the second (or more) of the same value is attempted to 
       // be inserted.
       distinct_container.insert(abs(*curr_int));
   }

   return distinct_container.size();
}

There is still some runtime penalty with this approach. Using a separate container incurs the cost of dynamic allocations as the container size increases. You could do this in place and not occur this penalty, however with code at this level its sometimes better to be clear and explicit and let the optimizer (in the compiler) do its work.

Upvotes: 3

Related Questions