Reputation: 13564
I came across this post, which reports the following interview question:
Given two arrays of numbers, find if each of the two arrays have the same set of integers ? Suggest an algo which can run faster than NlogN without extra space?
The best that I can think of is the following:
(a) sort each array, and then (b) have two pointers moving along the two arrays and check if you find different values ... but step (a) has already NlogN complexity :(
(a) scan shortest array and put values into a map, and then (b) scan second array and check if you find a value that is not in the map ... here we have linear complexity, but we I use extra space
... so, I can't think of a solution for this question.
Ideas?
Thank you for all the answers. I feel many of them are right, but I decided to choose ruslik's one, because it gives an interesting option that I did not think about.
Upvotes: 26
Views: 9451
Reputation: 408
why not i find the sum , product , xor of all the elements one array and compare them with the corresponding value of the elements of the other array ??
the xor of elements of both arrays may give zero if the it is like
2,2,3,3 1,1,2,2
but what if you compare the xor of the elements of two array to be equal ???
consider this
10,3 12,5
here xor of both arrays will be same !!! (10^3)=(12^5)=9 but their sum and product are different . I think two different set of elements cannot have same sum ,product and xor ! This can be analysed by simple bitvalue examination. Is there anything wrong in this approach ??
Upvotes: 0
Reputation: 7393
Was just thinking if there was a way you could hash the cumulative of both arrays and compare them, assuming the hashing function doesn't produce collisions from two differing patterns.
Upvotes: 0
Reputation: 1337
In the algebraic decision tree model, there are known Omega(NlogN) lower bounds for computing set intersection (irrespective of the space limits).
For instance, see here: http://compgeom.cs.uiuc.edu/~jeffe/teaching/497/06-algebraic-tree.pdf
So unless you do clever bit manipulations/hashing type approaches, you cannot do better than NlogN.
For instance, if you used only comparisons, you cannot do better than NlogN.
Upvotes: 1
Reputation: 14870
You can try a probabilistic approach by choosing a commutative function for accumulation (eg, addition or XOR) and a parametrized hash function.
unsigned addition(unsigned a, unsigned b);
unsigned hash(int n, int h_type);
unsigned hash_set(int* a, int num, int h_type){
unsigned rez = 0;
for (int i = 0; i < num; i++)
rez = addition(rez, hash(a[i], h_type));
return rez;
};
In this way the number of tries before you decide that the probability of false positive will be below a certain treshold will not depend on the number of elements, so it will be linear.
EDIT: In general case the probability of sets being the same is very small, so this O(n) check with several hash functions can be used for prefiltering: to decide as fast as possible if they are surely different or if there is a probability of them being equivalent, and if a slow deterministic method should be used. The final average complexity will be O(n), but worst case scenario will have the complexity of the determenistic method.
Upvotes: 11
Reputation: 25644
A special, not harder case is when one array holds 1,2,..,n. This was discussed many times:
and despite many tries no deterministic solutions using O(1) space and O(n) time were shown. Either you can cheat the requirements in some way (reuse input space, assume integers are bounded) or use probabilistic test.
Probably this is an open problem.
Upvotes: 2
Reputation: 21
The usual assumption for these kinds of problems is Theta(log n)-bit words, because that's the minimum needed to index the input.
sshannin's polynomial-evaluation answer works fine over finite fields, which sidesteps the difficulties with limited-precision registers. All we need are a prime of the appropriate (easy to find under the same assumptions that support a lot of public-key crypto) or an irreducible polynomial in (Z/2)[x] of the appropriate degree (difficulty here is multiplying polynomials quickly, but I think the algorithm would be o(n log n)).
If we can modify the input with the restriction that it must maintain the same set, then it's not too hard to find space for radix sort. Select the (n/log n)th element from each array and partition both arrays. Sort the size-(n/log n) pieces and compare them. Now use radix sort on the size-(n - n/log n) pieces. From the previously processed elements, we can obtain n/log n bits, where bit i is on if a[2*i] > a[2*i + 1] and off if a[2*i] < a[2*i + 1]. This is sufficient to support a radix sort with n/(log n)^2 buckets.
Upvotes: 1
Reputation: 9182
All I know is that comparison based sorting cannot possibly be faster than O(NlogN), so we can eliminate most of the "common" comparison based sorts. I was thinking of doing a bucket sort. Perhaps if this qn was asked in an interview, the best response would first be to clarify what sort of data those integers represent. For e.g., if they represent a persons age, then we know that the range of values of int is limited, and can use bucket sort at O(n). However, this will not be in place....
Upvotes: -1
Reputation: 2825
Here is a co-rp algorithm:
In linear time, iterate over the first array (A), building the polynomial Pa = A[0] - x)(A[1] -x)...(A[n-1] - x). Do the same for array B, naming this polynomial Pb.
We now want to answer the question "is Pa = Pb?" We can check this probabilistically as follows. Select a number r uniformly at random from the range [0...4n] and compute d = Pa(r) - Pb(r) in linear time. If d = 0, return true; otherwise return false.
Why is this valid? First of all, observe that if the two arrays contain the same elements, then Pa = Pb, so Pa(r) = Pb(r) for all r. With this in mind, we can easily see that this algorithm will never erroneously reject two identical arrays.
Now we must consider the case where the arrays are not identical. By the Schwart-Zippel Lemma, P(Pa(r) - Pb(r) = 0 | Pa != Pb) < (n/4n). So the probability that we accept the two arrays as equivalent when they are not is < (1/4).
Upvotes: 1
Reputation: 56956
I'll assume that the integers in question are of fixed size (eg. 32 bit).
Then, radix-quicksorting both arrays in place (aka "binary quicksort") is constant space and O(n).
In case of unbounded integers, I believe (but cannot proof, even if it is probably doable) that you cannot break the O(n k) barrier, where k is the number of digits of the greatest integer in either array.
Whether this is better than O(n log n) depends on how k is assumed to scale with n, and therefore depends on what the interviewer expects of you.
Upvotes: 3
Reputation: 65854
You said "without extra space" in the question but I assume that you actually mean "with O(1) extra space".
Suppose that all the integers in the arrays are less than k. Then you can use in-place radix sort to sort each array in time O(n log k) with O(log k) extra space (for the stack, as pointed out by yi_H in comments), and compare the sorted arrays in time O(n log k). If k does not vary with n, then you're done.
Upvotes: 6
Reputation: 49826
If the arrays have the same size, and there are guaranteed to be no duplicates, sum each of the arrays. If the sum of the values is different, then they contain different integers.
Edit: You can then sum the log of the entries in the arrays. If that is also the same, then you have the same entries in the array.
Upvotes: -2
Reputation: 56762
For each integer i
check that the number of occurrences of i
in the two arrays are either both zero or both nonzero, by iterating over the arrays.
Since the number of integers is constant the total runtime is O(n)
.
No, I wouldn't do this in practice.
Upvotes: 0
Reputation: 96258
You can break the O(n*log(n)) barrier if you have some restrictions on the range of numbers. But it's not possible to do this if you cannot use any extra memory (you need really silly restrictions to be able to do that).
I would also like to note that even O(nlog(n)) with sorting is not trivial if you have O(1) space limit as merge sort uses O(n) space and quicksort (which is not even strict o(nlog(n)) needs O(log(n)) space for the stack. You have to use heapsort or smoothsort.
Some companies like to ask questions which cannot be solved and I think it is a good practice, as a programmer you have to know both what's possible and how to code it and also know what are the limits so you don't waste your time on something that's not doable.
Check this question for a couple of good techniques to use: Algorithm to tell if two arrays have identical members
Upvotes: 0
Reputation: 2797
I'm not sure that correctly understood the problem, but if you are interested in integers that are in both array:
If N >>>>> 2^SizeOf(int) (count of bit for integer (16, 32, 64)) there is one solution:
a = Array(N); //length(a) = N;
b = Array(M); //length(b) = M;
//x86-64. Integer consist of 64 bits.
for i := 0 to 2^64 / 64 - 1 do //very big, but CONST
for k := 0 to M - 1 do
if a[i] = b[l] then doSomething; //detected
for i := 2^64 / 64 to N - 1 do
if not isSetBit(a[i div 64], i mod 64) then
setBit(a[i div 64], i mod 64);
for i := 0 to M - 1 do
if isSetBit(a[b[i] div 64], b[i] mod 64) then doSomething; //detected
O(N), with out aditional structures
Upvotes: -1