Reputation: 749
I have a struct
which roughly looks like that:
struct vec3
{
int x;
int y;
int z;
constexpr bool operator==(const vec3& v) const
{
return (x == v.x) && (y == v.y) && (z == v.z);
}
constexpr vec3 operator-() const
{
return {-x, -y, -z};
}
};
I then generate a std::vector
of vec3
with random values for each coordinates.
The function in which it is used requires that no couple of values {v1, v2}
in that vector verifies v1 == -v2
. I obviously need that definition of operator==
(the one in the snippet above) elsewhere in code, otherwise that problem would be trivial.
I first attempted std::set
and std::sort
+ std::unique
, but could not find any way to have a struct
filing named requirements Compare for that application (which is needed for both approaches).
How can I proceed?
Note:
This is somewhat different from Removing duplicates from a non-sortable vector in which pointers are sorted, and also from C++ how to remove duplicates from vector of Class type? in which the elements are sortable according to some criteria (I think).
Upvotes: 0
Views: 518
Reputation: 749
I believe the simplest method would be to use std::unordered_set
and exploit its second and third template
parameters.
This step goal is a provide a " pre-filtering " step which eliminates obvious duplicates according to the meaning in the context (so here, v1
and -v1
should for example have the same hash).
That's something that should be on a per-class basis . There is no way to come up with a generic performant hashing function, though non-performant critical application may use the first hasher below (but I won't really recommend that anymore).
a. The bad hasher
This is what I originally proposed, before taking comment from @axnsan and @François Andrieux in count.
The simplest hasher I can think of looks like that:
struct bad_hasher
{
std::size_t operator()(const value_type&) const
{
return 0;
}
};
It makes all hash collide and forces std::unordered_set
to use KeyEqual
to determine objects equality.
So indeed, that works, but that does not make it a good choice. @axnsan and @François Andrieux pointed the following drawbacks in the comment below:
In other words, this makes std::unordered_set
become the same as a std::vector
+ std::remove_if
.
b. The better hasher
@axnsan suggests the following hasher for this particular application:
struct better_hasher
{
std::size_t operator()(const vec3& v) const
{
return static_cast<std::size_t>(std::abs(v.x) + std::abs(v.y) + std::abs(v.z));
}
};
It fills the following requirements:
better_hasher(v) == better_hasher(-v)
.v1 != v2
=> better_hasher(v1) != better_hasher(v2)
in most cases ((1, 0, 1)
will collide with (1, 1, 0)
for example)Which is probably somewhere near the best we could hope for in this configuration.
We then need to remove those "false positives" due to hash collisions.
The goal here is to remove duplicates that were not detected by the hasher (here, typically vectors such as (1, 0, 1) / (1, 1, 0)
or overflow).
Declare a predicate struct
which roughly looks like:
struct key_equal
{
bool operator()(const value_type& a, const value_type& b) const
{
return (a == b) || <...>;
}
};
Where <...>
is anything making two values identical in the current context ( so here a == b) || -a == b
for example).
Note that this expects operator==
to be implemented.
Declare an std::unordered_set
which removes duplicates:
const std::unordered_set<value_type, hasher, key_equal> s(container.begin(), container.end());
container.assign(s.begin(), s.end());
Basically the same, but this checks if an element can be inserted in the std::unordered_set
, and if does not, remove it. Adapted from @yury's answer in What's the most efficient way to erase duplicates and sort a vector?.
std::unordered_set<value_type, hasher, comparator> s;
const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};
container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());
template<typename key_equal_t, typename hasher_t, bool order_conservative, typename container_t>
void remove_duplicates(container_t& container)
{
using value_type = typename container_t::value_type;
if constexpr(order_conservative)
{
std::unordered_set<value_type, hasher_t, key_equal_t> s;
const auto predicate = [&s](const value_type& value){return !s.insert(value).second;};
container.erase(std::remove_if(container.begin(), container.end(), predicate), container.end());
}
else
{
const std::unordered_set<value_type, hasher, key_equal_t> s(container.begin(), container.end());
container.assign(s.begin(), s.end());
}
}
Expects to be provided key_equal_t
and hasher_t
(and a bool
known a compile time indicating if you care about element being kept in the same order or not). I did not benchmark any of the two branches in this function so I do not know if one is significantly better than another, though this answer seems show manual insertion may be faster.
/// "Predicate" indicating if two values are considered duplicates or not
struct key_equal
{
bool operator()(const vec3& a, const vec3& b) const
{
// Remove identical vectors and their opposite
return (a == b) || (-a == b);
}
};
/// Hashes a vec3 by adding absolute values of its components.
struct hasher
{
std::size_t operator()(const vec3& v) const
{
return static_cast<std::size_t>(std::abs(v.x) + std::abs(v.y) + std::abs(v.z));
}
};
remove_duplicates<key_equal, hasher, order_conservative>(vec);
vec3 v1{1, 1, 0}; // Keep
vec3 v2{0, 1, 0}; // Keep
vec3 v3{1, -1, 0}; // Keep
vec3 v4{-1, -1, 0}; // Remove
vec3 v5{1, 1, 0}; // Remove
std::vector vec{v1, v2, v3, v4, v5};
// ...<print vec values>
remove_duplicates<key_equal, hasher, false>(vec);
// ... <print vec values>
Output (before / after):
(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0)
(1, -1, 0) (0, 1, 0) (1, 1, 0)
// ... <print vec values>
remove_duplicates<key_equal, hasher, true>(vec);
// ... <print vec values>
Output (before / after):
(1, 1, 0) (0, 1, 0) (1, -1, 0) (-1, -1, 0) (1, 1, 0)
(1, 1, 0) (0, 1, 0) (1, -1, 0)
Upvotes: 1
Reputation: 238311
The function in which it is used requires that no couple of values {v1, v2} in that vector fills v1 == -v2
but could not find any way to have a struct filing named requirements Compare for that application (which is needed for both approaches).
It seems to me that you're trying to solve X, but this is the Y in your XY-problem.
It's fairly simple to implement an ordered comparator that satisfies the equality of -v == v
. Simply compare absolute values:
struct vec3
{
int x;
int y;
int z;
// normal comparison that treats -x != x
friend auto operator<=>(const vec3&, const vec3&) = default;
};
// declare in same namespace as vec3 for ADL
vec3 abs(const vec3& v) {
return {std::abs(v.x), std::abs(v.y), std::abs(v.z)};
}
struct abs_less {
template< class T, class U>
constexpr auto operator()( T&& lhs, U&& rhs ) const
-> decltype(std::forward<T>(lhs) < std::forward<U>(rhs))
{
using std::abs; // for integers
return abs(lhs) < abs(rhs); // this implementation assumes normal comparison operator
// you can implement logic directly here if that's not possible
}
};
This comparator works with both std::set
and std::sort
+ std::unique
. Example with set:
std::set<vec3, abs_less> example;
Of course, you could overload the operator<
directly and use std::less
, but usually I would recommend against non-defaulted operator overloads that have unusual behaviour.
Upvotes: 1