WhitneyD
WhitneyD

Reputation: 127

How to remove duplicated items in a sorted vector

I currently have a vector<int> c which contains {1,2,2,4,5,5,6} and I want to remove the duplicated numbers so that c will have {1,4,6}. A lot of solutions on the internet seem to just remove one of the duplicates, but I'm trying to remove all occurrences of the duplicated number.

Assume c is always sorted.

I currently have

#include <iostream>
#include <vector>

int main() {
  vector<int> c{1,2,2,4,5,5,6};
  for (int i = 0; i < c.size()-1; i++) {
    for (int j=1; j<c.size();j++){
      if(c[i] == c[j]){
         // delete i and j? 
      }
    }
  }
}

I tried to use two for-loops so that I can compare the current element and the next element. This is where my doubt kicked in. I'm not sure if I'm approaching the problem correctly.

Could I get help on how to approach my problem?

Upvotes: 10

Views: 2379

Answers (7)

Jarod42
Jarod42

Reputation: 217593

Similarly to std::unique/std::copy_if, you might do:

void keep_unique(std::vector<int>& v){
    auto it = v.begin();
    auto w = v.begin();
    while (it != v.end())
    {
        auto next = std::find_if(it, v.end(), [&](int e){ return e != *it; });
        if (std::distance(it, next) == 1) {
            if (w != it) {
                *w = std::move(*it);
            }
            ++w;
        }
        it = next;
    }
    v.erase(w, v.end());
}

Demo

Upvotes: 2

Rotem
Rotem

Reputation: 21927

Iterating in reverse ensures an O(N) operation and does not cause element shifting when erasing because we are only ever erasing the last element in the vector. Also, no other data structures need to be allocated.

For every element encountered, we check if the adjacent element is equal, and if so, remove all instances of that element.

Requires the vector to be sorted, or at least grouped by duplicates.

#include <iostream>
#include <vector> 
int main()
{
    std::vector<int> c {1, 2, 2, 4, 5, 5, 6};
    
    for (int i = c.size() - 1; i > 0;) 
    {
        const int n = c[i];
        if (c[i - 1] == n)
        {
            while (c[i] == n)
            {
                c.erase(c.begin() + i--);
            }
        }
        else
        {
            i--;
        }

    }

    //output result
    for (auto it : c)
    {
        std::cout<<it;
    }
    std::cout << std::endl;
}

Output: 146

Update

An actual O(N) implementation using a sentinel value:

#include <iostream>
#include <vector> 
#include <limits>
#include <algorithm>

int main()
{
    std::vector<int> c { 1, 2, 2, 4, 5, 5, 6 };
    const int sentinel = std::numeric_limits<int>::lowest(); //assumed that no valid member uses this value.
    for (int i = 0; i < c.size() - 1;) 
    {
        const int n = c[i];
        if (c[i + 1] == n)
        {
            while (c[i] == n)
            {
                c[i++] = sentinel;
            }
        }
        else
        {
            i++;
        }
    }
    
    c.erase(std::remove(c.begin(),c.end(),sentinel), c.end());

    for (auto it : c) std::cout<< it << ' ';
}
        

Upvotes: 1

Jerry Coffin
Jerry Coffin

Reputation: 490228

Almost any time you find yourself deleting elements from the middle of a vector, it's probably best to sit back and think about whether this is the best way to do the job--chances are pretty good that it isn't.

There are a couple of obvious alternatives to that. One is to copy the items you're going to keep into a temporary vector, then when you're done, swap the temporary vector and the original vector. This works particularly well in a case like you've shown in the question, where you're keeping only a fairly small minority of the input data.

The other is to rearrange the data in your existing vector so all the data you don't want is at the end, and all the data you do want is at the beginning, then resize your vector to eliminate those you don't want.

When I doubt, I tend to go the first route. In theory it's probably a bit less efficient (poorer locality of reference) but I've rarely seen a significant slow-down in real use.

That being the case, my initial take would probably be something on this general order:

#include <vector>
#include <iostream>
#include <iterator>

std::vector<int> remove_all_dupes(std::vector<int> const &input) {

    if (input.size() < 2) // zero or one element is automatically unique
        return input;

    std::vector<int> ret;

    // first item is unique if it's different from its successor
    if (input[0] != input[1])
        ret.push_back(input[0]);

    // in the middle, items are unique if they're different from both predecessor and successor
    for (std::size_t pos = 1; pos < input.size() - 2; pos++)
        if (input[pos] != input[pos-1] && input[pos] != input[pos+1])
            ret.push_back(input[pos]);

    // last item is unique if it's different from predecessor
    if (input[input.size()-1] != input[input.size()-2])
        ret.push_back(input[input.size() - 1]);

    return ret;
}

int main() {
    std::vector<int> c { 1, 2, 2, 4, 5, 5, 6 };
    std::vector<int> uniques = remove_all_dupes(c);

    std::copy(uniques.begin(), uniques.end(), std::ostream_iterator<int>(std::cout, "\n"));
}

Probably a little longer of code than we'd really prefer, but still simple, straightforward, and efficient.

If you are going to do the job in place, the usual way to do it efficiently (and this applies to filtering in general, not just this particular filter) is to start with a copying phase and follow that by a deletion phase. In the copying phase, you use two pointers: a source and a destination. You start them both at the first element, then advance through the input with the source. If it meets your criteria, you copy it to the destination position, and advance both. If it doesn't meet your criteria, advance only the source.

Then when you're done with that, you resize your vector down to the number of elements you're keeping.

void remove_all_dupes2(std::vector<int> & input) {
    if (input.size() < 2)   { // 0 or 1 element is automatically unique
        return;
    }

    std::size_t dest = 0;

    if (input[0] != input[1])
        ++dest;

    for (std::size_t source = 1; source < input.size() - 2; source++) {
        if (input[source] != input[source-1] && input[source] != input[source+1]) {
            input[dest++] = input[source];
        }
    }
    if (input[input.size()-1] != input[input.size()-2]) {
        input[dest++] = input[input.size() - 1];
    }
    input.resize(dest);
}

At least in my view, the big thing to keep in mind here is the general pattern. You'll almost certainly run into a lot more situations where you want to filter some inputs to those that fit some criteria, and this basic pattern of tracking source and destination, and copying only those from the source to the destination that fit your criteria works well in a lot of situations, not just this one.

Upvotes: 7

Christopher Miller
Christopher Miller

Reputation: 3461

This code is based on the insight that an element is unique in a sorted list if and only if it is different from both elements immediately adjacent to it (except for the starting and ending elements, which are adjacent to one element each). This is true because all identical elements must be adjacent in a sorted array.

void keep_unique(vector <int> &v){
    if(v.size() < 2){return;}

    vector <int> unique;

    // check the first element individually
    if(v[0] != v[1]){
        unique.push_back(v[0]);
    }

    for(unsigned int i = 1; i < v.size()-1; ++i){
        if(v[i] != v[i-1] && v[i] != v[i+1]){
            unique.push_back(v[i]);
        }
    }

    // check the last item individually
    if(v[v.size()-1] != v[v.size()-2]){
        unique.push_back(v[v.size()-1]);
    }

    v = unique;
}

Upvotes: 4

Quimby
Quimby

Reputation: 19133

Generally one has to be very careful when deleting from containers while iterating over them. C++ STL can do this easily and faster (on average) than using nested loops.

#include <vector> 
#include <algorithm>
#include <unordered_set>

int main() {
    std::vector<int> c{1,2,2,4,5,5,6};
    std::unordered_multiset<int> unique( c.begin(), c.end() );
    c.erase(std::remove_if(c.begin(),c.end(),[&](const auto& e){return unique.count(e)>1;}),c.end());

    for(auto e: c){
        std::cout<<e<<' ';
    }
}

//Output: 1 4 6

Alternatively, you could use std::map<int,std::size_t> and count the occurences this way.

Upvotes: 2

Tony Tannous
Tony Tannous

Reputation: 14886

Use std::remove_if to move items occurring multiple times to the rear, then erase them.

#include <iostream>
#include <vector>
#include <algorithm>


int main()
{

    std::vector<int> V {1,2,2,4,5,5,6};

    auto it = std::remove_if(V.begin(), V.end(), [&](const auto& val) 
    { 
        return std::count(V.begin(), V.end(), val) > 1;
    });

    V.erase(it, V.end());

    for (const auto& val : V)
        std::cout << val << std::endl;
    
    return 0;
}

Output:

1
4
6

For demo: https://godbolt.org/z/j6fxe1

Upvotes: 1

Deepak Tatyaji Ahire
Deepak Tatyaji Ahire

Reputation: 5311

This can be achieved with the proper use of iterators to avoid runtime errors.

Have a look at the following code:

#include <iostream>
#include <vector>

int main() {
    
    std::vector<int> c{1,2,2,4,5,5,6};

    for (auto it = c.begin(); it != c.end(); ){
        
        auto it2 = it;
        advance(it2, 1);
        
        bool isDuplicate = false;
        
        for(; it2 != c.end(); ++it2){
            if(*it == *it2){
                c.erase(it2);
                isDuplicate = true;
            }
        }
        
        if(isDuplicate){
            auto it3 = it;
            advance(it3, 1);
            c.erase(it);
            it = it3;
        }else{
            it++;
        }
    }
    
    for (auto it = c.begin(); it != c.end(); it++){
        
        std::cout<<*it<<" ";
    }
}

Output:

1 4 6 

Upvotes: 0

Related Questions