Reputation: 65

C++, Remove duplicate items

What is wrong in this code? I am trying to remove duplicate elements from the user defined container.

template <typename Item>
struct TList
{
    typedef std::vector <Item> Type;
};

template <typename Item>
class GenericContainer
{
    protected:
            typename TList <Item>::Type items;
};

There is a method removing duplicate elements in container specialized for Item * (ie items are dynamically allocated):

template <typename Item>
template <typename Sort, typename Equal>
void Container <Item * >::remove ( typename TList <Item *>::Type ::iterator it_begin, typename TList <Item *>::Type ::iterator it_end,     Sort sort, Equal equal )
{           //Sort items, ok 
    std::sort ( it_begin, it_end, sort );

            //Apply unique, OK
    typename TList <Item *>::Type ::iterator i_new_end = std::unique ( it_begin, it_end, equal );

            //Delete items after new end of container, OK
    for (typename TList <Item *>::Type ::iterator i_item = i_new_end ; i_item != it_end; i_item ++)
    {
        delete *i_item; //call destructor
    }

            //Erase duplicate items
    this->items.erase ( i_new_end, it_end );

            //Another sort, Exception: Acces in free memory
    std::sort ( it_begin, i_new_end, sort );

);

I can not find the problem in line

            //Another sort, Exception: Acces in free memory
    std::sort ( it_begin, i_new_end, sort );

or lines before...

Error log:
Access in freed memory in process: sw.exe(line 796)

Thanks for your advice.

Updated question:

I translate the code with another compiler (MSVS 2010) and checked items of vector. Here are the results:

Input dataset: 628 items.

A) std::sort ( it_begin, it_end, sort );

628 items

B) typename TList <Item *>::Type ::iterator i_new_end = std::unique ( it_begin, it_end, equal );

612 unique items

C) for (typename TList <Item *>::Type ::iterator i_item = i_new_end ; i_item != it_end; i_item ++)
{
    delete *i_item; //call destructor
}

To my surprise items since item [595] have been deleted (why not item[613]???). I do not understand such strange behavior...

Upvotes: 3

Answers (4)

Potatoswatter

Reputation: 137940

My bet is that not only do some values appear more than once, some individual objects appear more than once in the sequence.

When you delete all the duplicate values, you destroy some of the objects that remain in the sequence.

The second sort (which is unnecessary, since unique does not rearrange things as it removes duplicates) accesses every object, so it immediately steps on the ones which were just deleted.

One possible solution is to sort the pointers in both ranges resulting from unique. Use set_difference( i_new_end, it_end, i_begin, i_new_end, i_sequence_inserter ) to find the objects that actually need to be freed — assuming they aren't still being used anywhere else.

Or, just use smart pointers, or no pointers at all :v) .

Edit:

See my comment — the best solution is likely to eliminate the use of pointers entirely.

Anyway, here is a sample set_difference solution, this one using a custom iterator instead of a sequence inserter.

template <typename Item>
template <typename Sort, typename Equal>
void Container <Item * >::remove ( typename TList <Item *>::Type ::iterator it_begin, typename TList <Item *>::Type ::iterator it_end,     Sort sort, Equal equal )
{           //Sort items, ok 
    std::sort ( it_begin, it_end, sort );

            //Apply unique, OK
    typename TList <Item *>::Type ::iterator i_new_end = std::unique ( it_begin, it_end, equal );

    // Now sort the sub-ranges on pointer values to identify duplicate pointers
    std::sort( it_begin, i_new_end );
    std::sort( i_new_end, it_end );

    // delete all pointers that appear only in the set of duplicate values to be erased
    struct deleter {
        deleter &operator *() { return *this; } // assignment to target is assgn. to iter
        deleter &operator =( Item *rhs ) { delete rhs; return *this; }
        deleter &operator ++() { return *this; } // increment is no-op
        deleter &operator ++(int) { return *this; } // increment is no-op
    };
    std::set_difference( i_new_end, it_end, it_begin, i_new_end, deleter() );

            //Erase duplicate items
    this->items.erase ( i_new_end, it_end );

            //Another sort, Exception: Acces in free memory
    std::sort ( it_begin, i_new_end, sort );

);

Upvotes: 2

Tim

Reputation: 9172

After a little testing, I think i_new_end is invalidated by the .erase() call.

According to documentation, it should still be valid -- only pointers after i_new_end are supposed to be invalidated. But VisualC++ apparently doesn't follow that. It almost does the right thing, i_new_end and end() both point to the same location in memory, but _Mycont is reset to zero for i_new_end, which invalidates the pointer.

That is, even though they point to the same address, this fails:

  if (i_new_end._Myptr == end()._Myptr)  // yes, they're the same
  if (i_new_end == end())  // boom.  exception

So, try this:

// remember next-to-last item
typename TList <Item *>::Type ::iterator nextlast = i_new_end - 1;

//Erase duplicate items
this->items.erase ( i_new_end, it_end );

// reset i_new_end
i_new_end = nextlast + 1;

// Now this line should not throw
std::sort ( it_begin, i_new_end, sort );

Also keep in mind the memory issues mentioned by Potatoswatter.

Upvotes: 0

Mark B

Reputation: 96311

It looks to me like the passed in iterators aren't actually from this->items but from some other container.

Another possibility is that delete *i_item is causing the problem and it only shows up in the final line as a delayed reaction.

EDIT: One other option that comes up way more often than you'd expect is that your sort predicate doesn't obey strict weak ordering. Could you show the code for that?

Upvotes: 0

rturrado

Reputation: 8074

Sounds like iterators used in last call to std::sort may be invalid. Probably it_begin. Can you try using this->items.begin() instead?

Upvotes: 0

C++, Remove duplicate items

Answers (4)

Edit:

Related Questions