Reputation: 1056
Given an array of n integer elements how will you find whether there are duplicates in the array in O(n) time without using any extra space.
With extra space it means extra space of order O(n).
Does the Xor operator help in any way.
Upvotes: 28
Views: 54304
Reputation: 3321
To kickstart our visual thinking, let's have an example array: 50 100 150 -2 -1 0 1 2 3 4
. As you can surely tell, it doesn't have duplicates, so our algorithm should output FALSE
. Also, its length is 10
.
Let's ignore the extra memory constraint for now (actually, violate it really badly, by assuming we can have O(\inf)
additional memory :) and save in a fictional infinite array (it is also doubly-infinite, since it allows negative indeces too) the counts for each integer. For our input, this array would look like this:
...000001111111000...00100...00100...001000000...
^ ^ ^
[index -2] [index 50] [index 150]
If any of the elements of the array is greater than 1
, then we have a duplicate and the algorithm should return TRUE
.
Let's assume we have a map f(x):-inf..inf -> 0..N
that can compress our infinite array to an array of size N, and furthermore do it in O(N) time. This is what hashing ideally does. Note that we don't care about maintaining the order of the array, as we only care whether it has elements that are above 1. Thus, we can combine these two steps, and eliminate the need of inifinite memory - yay! We are still using an additional O(N) memory (in fact, exactly N counts) to keep the count values. The next step would get rid of that.
Before I explain this step, notice that we don't really need to store any counts greater than 1. The first time we want to increase a counter and we notice it already has the value of 1 we know we found a duplicate! So 1 bit of memory per counter is enough. This reduces the required memory to O(lg(N)), but we don't really care about this, as it is not good enough. The important part is that 1 bit of memory per counter is enough.
We are now going to exploit the fact that we can modify our input array. We go over the array and xor
all the elements with the value of the first element. If the result is less than the value before the operation we change it to that result. We also store the first element seperately as sw
at an additional O(1) memory cost.
Now, we can use the stored first element sw
and the transformed array to code in the counts from the counting step (steps A + B) in the following manner: considering element with index k
of the A
, if A[f(A[k])] < A[f(A[k])] xor sw
then the count is zero
which means the element we are considering - A[k]
- has not been seen before, so it we change A[f(A[k])]
to A[f(A[k])] xor sw
. If, otherwise, A[f(A[k])] > A[f(A[k])] xor sw
then the count is one
which means the element we are considering - A[k]
- already has been seen before, so it is a duplicate.
Assuming the map:
f(-2 xr 50) -> 0
f(-1 xr 50) -> 1
f(0) -> 2
f(1) -> 3
f(2) -> 4
f(3) -> 5
f(4) -> 6
f(86) -> 7
f(150) -> 8
f(1337) -> 9
and after performing the steps in the following order: step c; step a+b
the input array looks like this:
50(0) 100(86) 150(164) -2(-2 xr 50) -1(-1 xr 50) 0(50) 1(51) 2(48) 3(49) 4(54) [intermediate state, not stored in memory]
0 86 150 -2 xr 50 -1 xr 50 0 1 2 3 4 [state after step c]
0 86 *164* -2 xr 50 -1 xr 50 0 1 2 3 4 [counted element 0]
0 86 164 -2 xr 50 -1 xr 50 0 1 *48* 3 4 [counted element 1]
0 86 164 -2 xr 50 -1 xr 50 0 1 48 *49* 4 [counted element 2]
*50* 86 164 -2 xr 50 -1 xr 50 0 1 48 49 4 [counted element 3]
50 *100* 164 -2 xr 50 -1 xr 50 0 1 48 49 4 [counted element 4]
50 100 !164! -2 xr 50 -1 xr 50 0 1 48 49 4 [counted element 5]
Trying to count element with index 5
which is 0
we see that there already was a 0
in the array! (because A[f(A[5])]
is 164
which is greater than 164 xr 50
) So we output TRUE
and the algorithm ends.
If we are not allowed enough memory x time
we are bound to forget something and make a mistake.
Unfortunately, we don't have a perfect hash function, and we cannot just create memory out of thin air, so a traditional approach would not work under the required constraints. The algorithm that the answer given by the OP
points to might be modified so that it allows using numbers which interpreted as array indeces would fall outside the bounds of the array, given a perfect hash function. But even then, it has to be invented how to use it to detect duplication, rather than find one guaraneed to exist...
Upvotes: 0
Reputation: 1068
you can use swap sort to do all the necessary swaps and find the duplicate elements eventually. You can also scale this solution if you want to find more than one duplicate elements in the same array. One caveat with this solution is that the range of the numbers in the array has to be from 1 to n.
below is the sample code for this. input -- > [1,3,4,2,2] here the duplicate element is 2. You can run this code for the above input.
public int findDuplicate(int[] nums) {
int i = 0;
while(i < nums.length){
if(nums[i] != nums[nums[i] - 1]){
swap(i, nums[i] - 1, nums);
}else{
i++;
}
}
for(int j=0; j < nums.length; j++){
if(nums[j] != j + 1){
return nums[j];
}
}
return -1;
}
public static void swap(int i, int j, int[] nums){
int temp = nums[i];
nums[i] = nums[j];
nums[j] = temp;
}
the main thing that has to be kept in mind is that the index i should contain i + 1 number. eg: the index 0 should have element 1 i.e. arr[0] = 1, index 1 should have element 2 i.e. arr[1] = 2 so on so fourth.
This solution doesn't require any extra space and the time complexity is O(n)
Upvotes: -1
Reputation: 1
I would like to put on the table the following solution. I hope it could help.
assert(A.size() > 1);
int m = A[0];
for(unsigned int i = 1; i < A.size(); ++i) { // O(n)
m ^= A[i];
m ~= m;
}
return m;
It is based on: https://codesays.com/2015/solution-to-odd-occurrences-in-array-by-codility/
Cheers
Upvotes: 0
Reputation: 480
Clean example to determinate duplicates with O(n) by time and O(1) by space:
public class DuplicateDetermineAlgorithm {
public static boolean isContainsDuplicate(int[] array) {
if (array == null) {
throw new IllegalArgumentException("Input array can not be null");
}
if (array.length < 2) {
return false;
}
for (int i = 0; i < array.length; i++) {
int pointer = convertToPositive(array[i]) - 1;
if (array[pointer] > 0) {
array[pointer] = changeSign(array[pointer]);
} else {
return true;
}
}
return false;
}
private static int convertToPositive(int value) {
return value < 0 ? changeSign(value) : value;
}
private static int changeSign(int value) {
return -1 * value;
}
}
Upvotes: 1
Reputation: 41230
Depending on what you actually consider the time complexity of a Radix sort to be, this solution is O(N) time, although my personal opinion is not so. I think that if you don't make the linear-time assumption on the integer sort, then the problem is unsolvable.
Due to the fact that the sort is in-place, there's only O(1) additional storage required.
Code is all C++11
template<typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void RecurseOnRadixSort(std::vector<T>& myArray, T mask, int zerosEnd, int onesBegin)
{
if (zerosEnd+1 >= onesBegin-1 || mask == 0)
return;
int zerosEnd2 = zerosEnd;
int onesBegin2 = onesBegin;
while(zerosEnd2+1 <= onesBegin2-1)
{
// swap ones to the right
if ((myArray[zerosEnd2+1] & mask) != 0)
{
std::swap(myArray[zerosEnd2+1], myArray[onesBegin2-1]);
--onesBegin2;
}
else
++zerosEnd2;
}
mask >>= 1;
//recurse on lhs
RecurseOnRadixSort(myArray, mask, zerosEnd, zerosEnd2+1);
//recurse on rhs
RecurseOnRadixSort(myArray, mask, onesBegin2-1, onesBegin);
}
template <typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void InPlaceRadixSort(std::vector<T>& myArray)
{
int zerosEnd = -1;
int onesBegin = static_cast<int>(myArray.size());
T mask = static_cast<T>(1) << sizeof(T)*8-1;
while(zerosEnd+1 <= onesBegin-1)
{
if ( (myArray[zerosEnd+1] & mask) != 0)
{
std::swap(myArray[zerosEnd+1], myArray[onesBegin-1]);
--onesBegin;
}
else
++zerosEnd;
}
mask = static_cast<T>(1) << sizeof(T)*8-2; // need to reassign in case of signed datatype
//recurse on lhs
RecurseOnRadixSort(myArray, mask, -1, zerosEnd+1);
//recurse on rhs
RecurseOnRadixSort(myArray, mask, onesBegin-1, static_cast<int>(myArray.size()));
// swap negatives to the front
auto iterSmallest = std::min_element(myArray.begin(), myArray.end());
if (*iterSmallest < 0)
{
std::reverse(myArray.begin(), myArray.end());
iterSmallest = std::min_element(myArray.begin(), myArray.end());
std::reverse(myArray.begin(), iterSmallest+1);
std::reverse(iterSmallest+1, myArray.end());
}
}
for (size_t i=0, j=1; j<myArray.size(); ++i,++j)
{
if (myArray[i] == myArray[j])
{
std::cout << "Found duplicate element " << myArray[i];
}
}
#include <iostream>
#include <string>
#include <vector>
#include <iostream>
#include <vector>
#include <algorithm>
#include <ctime>
#include <type_traits>
using namespace std;
#define N 10
template <typename T>
void PrintArray(const std::vector<T>& myArray)
{
for (auto&& element : myArray)
{
std::cout << element << std::endl;
}
}
template<typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void RecurseOnRadixSort(std::vector<T>& myArray, T mask, int zerosEnd, int onesBegin)
{
if (zerosEnd+1 >= onesBegin-1 || mask == 0)
return;
int zerosEnd2 = zerosEnd;
int onesBegin2 = onesBegin;
while(zerosEnd2+1 <= onesBegin2-1)
{
// swap ones to the right
if ((myArray[zerosEnd2+1] & mask) != 0)
{
std::swap(myArray[zerosEnd2+1], myArray[onesBegin2-1]);
--onesBegin2;
}
else
++zerosEnd2;
}
mask >>= 1;
//recurse on lhs
RecurseOnRadixSort(myArray, mask, zerosEnd, zerosEnd2+1);
//recurse on rhs
RecurseOnRadixSort(myArray, mask, onesBegin2-1, onesBegin);
}
template <typename T, typename std::enable_if<std::is_integral<T>::value>::type* = nullptr>
void InPlaceRadixSort(std::vector<T>& myArray)
{
int zerosEnd = -1;
int onesBegin = static_cast<int>(myArray.size());
T mask = static_cast<T>(1) << sizeof(T)*8-1;
while(zerosEnd+1 <= onesBegin-1)
{
if ( (myArray[zerosEnd+1] & mask) != 0)
{
std::swap(myArray[zerosEnd+1], myArray[onesBegin-1]);
--onesBegin;
}
else
++zerosEnd;
}
mask = static_cast<T>(1) << sizeof(T)*8-2; // need to reassign in case of signed datatype
//recurse on lhs
RecurseOnRadixSort(myArray, mask, -1, zerosEnd+1);
//recurse on rhs
RecurseOnRadixSort(myArray, mask, onesBegin-1, static_cast<int>(myArray.size()));
// swap negatives to the front
auto iterSmallest = std::min_element(myArray.begin(), myArray.end());
if (*iterSmallest < 0)
{
std::reverse(myArray.begin(), myArray.end());
iterSmallest = std::min_element(myArray.begin(), myArray.end());
std::reverse(myArray.begin(), iterSmallest+1);
std::reverse(iterSmallest+1, myArray.end());
}
}
int main() {
srand(time(NULL));
std::vector<int> myArray(N);
for (size_t i=0;i<myArray.size();++i)
{
myArray[i] = rand() % 100 * (rand() % 2 == 1?-1:1);
}
std::cout << "Vector before radix sort: " << std::endl;
PrintArray(myArray);
InPlaceRadixSort(myArray);
std::cout << "Vector after radix sort: " << std::endl;
PrintArray(myArray);
for (size_t i=0, j=1; j<myArray.size(); ++i,++j)
{
if (myArray[i] == myArray[j])
{
std::cout << "Found duplicate element " << myArray[i];
}
}
return 0;
}
Upvotes: 7
Reputation: 178521
If there is no additional information, this question seems to be unsolveable, as this is the Element Distinctness Problem, which is unsolveable with the restrictions you provided, in the required time.
you can allow:
(1) more memory and use a hashtable / hashset and meet the O(n) time criteria. [iterate the array, check if an element is in the hash table, if it is you have dupes, otherwise - insert the element into the table and continue].
(2) more time, sort the array [O(nlogn)] and meet the sub-linear space criteria. [After sorting, iterate over the array, and for each a[i] , a[i+1]
, check if they are identical. If you did not find an identical pair, you have no dupes]
EDIT: The proof for this claim is a bit lengthy, and needs mathematical notation that are not supported here (sidenote: we really need tex support), but the idea is if we model our problem as an Algebraic Computation Tree (which is a fair assumption when no hashing is allowed, and constant space at out disposal), then, Ben Or proved in his article Lower Bounds For Algebraic Computation Trees (1983) (published in prestiged ACM), that element distinctness is Omega(nlogn)
problem under this model. Lubiw showed that the same conclusion also applies when limiting ourselves to integers in 1991: A Lower Bound for
the Integer Element Distinctness Problem, but these articles conclude that under the algebraic tree computation model - Integer Distinctness Problem is Omega(nlogn) Problem.
Upvotes: 29
Reputation: 899
For the general case, this problem doesn't seem to have a solution due to the strong complexity constraints and the unrestrained input.
It is clear, that you need at least N steps to even SEE all the input. So it can not be faster than O(n)
.
Now, to make sure to spot every possible duplicate, you have different possibilities:
O(n^2)
time`O(n log n)
time. Wikipedia has a rather lengthy writeup with lots of sources. So you can never get your time requirement right that way. (some comparison chart of known sorting algorithms)O(n)
, but that bookkeeping needs to be stored somewhere. Otherwise, you will simply "forget" which numbers you have already seen. Unfortunately, the bookkeeping will require more space if your input increases because you have so many different numbers to remember. So it is impossible to have the same fixed amount of memory and compare arbitrarily long input sequences. Therefore, you would have to violate the constant space O(1)
.As @Atishay points out in his answer, there can be a solution if you have a very restricted input. Here it is required that you have an array of size n
and the possible values are only in the range [0,n-2]
. This requirement guarantees that there MUST be a duplicate somewhere because there are less different values than elements in the array. With that knowledge and the very specific range of values, you can do it. But this uses very narrow assumptions and does not solve the general problem stated in the question.
As clarified in the comments, there is a proven lower bound for the time complexity of comparison-based sorting algorithms. For reference, see here:
Upvotes: 2
Reputation: 13244
This solution is based upon one that removes duplicates from an array by @dsimcha, as can be found here.
It performs an in-place swapping algorithm, with value hashes used to swap positions. Note that this destroys the original array content to some extent. But there was no requirement in OP's question that forbade that.
public static class DupFinder
{
public static bool HasDups(int[] array, ref int nEvals)
{
nEvals = 0;
return DupFinder.FindInPlace(array, 0, ref nEvals);
}
private static bool FindInPlace(int[] array, int start, ref int nEvals)
{
if (array.Length - start < 2)
return false;
var sentinel = array[start];
var offset = start + 1;
var len = array.Length - offset;
for (var ndx = 0; ndx < len; nEvals++)
{
var cur = array[offset + ndx];
if (cur == sentinel)
{
ndx++;
continue;
}
var hash = cur % len;
if (ndx == hash)
{
ndx++;
continue;
}
var at_hash = array[offset + hash];
if (cur == at_hash)
{
array[offset + ndx] = sentinel;
ndx++;
continue;
}
if (at_hash == sentinel)
{
Swap(array, offset, ndx, hash);
ndx++;
continue;
}
var hash_hash = at_hash % len;
if (hash_hash != hash)
{
Swap(array, offset, ndx, hash);
if (hash < ndx)
ndx++;
}
else
{
ndx++;
}
}
var swapPos = 0;
for (var i = 0; i < len; i++, nEvals++)
{
var cur = array[offset + i];
if (cur != sentinel && i == (cur % len))
Swap(array, offset, i, swapPos++);
}
for (var i = swapPos; i < len; nEvals++)
{
var cur = array[offset + i];
if (cur == sentinel)
return true; // got dups.
else
i++;
}
// Let's assume C# supports tail recursion ;-)
// Then => look ma, O(1) extra storage space.
return FindInPlace(array, offset + swapPos, ref nEvals);
}
private static void Swap(int[] array, int offset, int first, int second)
{
var tmp = array[offset + first];
array[offset + first] = array[offset + second];
array[offset + second] = tmp;
}
}
Thus, if we assume for a moment that c# supports tail recursion and we don't count the used stack frames as extra space, it has O(1) space requirements.
The author mentions it to be of O(N)-ish time complexity. The (limited) tests (as opposed to a computational complexity analysis) that I've performed would indicate it is closer to O(N log N).
Array Size Dup Position #Evals
12 7 26
12 - 35
100,000 80,000 279,997
100,000 - 453,441
Upvotes: 2
Reputation: 6331
public static void getDuplicatesElements (Integer arr[]){
//Status array to track the elements if they are already considered
boolean status[] = new boolean [arr.length];
//Flag to mark the element found its duplicate
boolean dupFlag = false;
//Output string
String output = "";
//Count of duplicate elements found
int count = 0;
//Initialize status array with all false i.e. no duplicates
for (int i = 0; i < arr.length; i++)
{
status[i] = false;
}
//first loop to check every element
for (int i = 0; i < arr.length - 1; i++)
{
//Initialize every element to no duplicate
dupFlag = false;
//Check if this element is not already found duplicate, if not, check now.
if (!status[i]){
for (int j = i+1; j < arr.length; j++){
if (arr[i] == arr[j]){
dupFlag = true;
status[j] = true;
}
}
}
if (dupFlag){
output = output + " " + arr[i];
count++;
}
}
System.out.println("Duplicate elements: " + output );
System.out.println("Count: " + count );
}
Upvotes: 0
Reputation: 15579
an implementation using a single int as a temporary variable.. this is using bit vectors/
public static boolean isUniqueChars(String str) {
int checker = 0;
for (int i = 0; i < str.length(); ++i) {
int val = str.charAt(i) - ‘a’;
if ((checker & (1 << val)) > 0) return false;
checker |= (1 << val);
}
return true;
}
or my prev implementation of O(n^2) without using any temp variable
public static bool isDuplicate(char[] str) {
if (str == null) return false;
int len = str.length;
if (len < 2) return false;
for (int i = 1; i < len; ++i) {
for (int j = 0; j < len; ++j) {
if (str[i] == str[j]) return true;
}
}
return false;
}
Upvotes: 1
Reputation: 5
import java.util.HashSet;
import java.util.Set;
public class FindDups {
public static void main(String[] args) {
int a[]={1,2,3,3,4};
Set<Integer> s=new HashSet<Integer>();
for(int i=0;i<a.length;i++)
{
if(!s.add(a[i]))
System.out.println("at index"+ i+" "+a[i]+"is duplicate");
}
for(int i:s)
{
System.out.println(i);
}
}
}
Upvotes: -2
Reputation: 151
Here is solution with O(n) time usage and O(1) space usage!
Traverse the array. Do following for every index i of A[].
{
check for sign of A[abs(A[i])] ;
if positive then make it negative by A[abs(A[i])]=-A[abs(A[i])];
else // i.e., A[abs(A[i])] is negative
this element (ith element of list) is a repetition
}
Credits: Method 5 Geek for Geeks
Upvotes: 2
Reputation: 1056
Here's an interesting solution to this problem with a single constraints that the elements should range between 0 to n-2(inclusive) where n is the number of elements.
This works in O(n) time with an O(1) space complexity.
Upvotes: 4
Reputation: 609
Bloom filter is a space efficient hashset with a tunable false positive rate. The false positive possibility means you have to go back and check for a real duplicate when you get a hit from the BF, introducing an N^2 term - but the coefficient is ~exp(-(extra space used for filter)). This produces an interesting space vs time tradeoff space.
I don't have a proof the question as posed is insoluble, but in general "here's an interesting tradeoff space" is a good answer to an insoluble problem.
Upvotes: 2