maestroviktorin
maestroviktorin

Reputation: 33

Compound `HashSet` operations in Rust OR How to get an explicit difference/union of `HashSet`s in Rust

I want perform the following, in pseudocode:

(a, b, c) = (HashSet(...), HashSet(...), HashSet(...))

(a, b, c) = (a - b - c, b - a - c, c - a - b)

In Rust I tried something like this:

fn get_random_set(...) -> HashSet<String> {
    ...
}

// Sets of randomly generated "words" that define behavior of the whole program.
let action_plus: HashSet<String> = get_random_set();
let action_minus: HashSet<String> = get_random_set();
let action_new_line: HashSet<String> = get_random_set();

Now we are to exclude all the common "words" from these HashSets. I learned that difference and union methods return Difference and Union iterators. And if I do this:

let action_plus = HashSet::from(action_minus.union(&action_new_line).collect::<Vec<String>>());

I receive this:

the trait `From<Vec<String>>` is not implemented for `HashSet<_, _>`

How to deal with this problem?

Upvotes: 3

Views: 668

Answers (3)

Sven Marnach
Sven Marnach

Reputation: 602485

Your goal is to remove any element that's in more than one set from all the sets. Since you are only removing elements, you don't need to clone any strings, and you can perform all operations in place. If your sets are called a, b and c, you can use this code:

a.retain(|x| b.remove(x) | c.remove(x));
b.retain(|x| c.remove(x));

This will first remove any element in the intersections between a and b as well as a and c, respectively, from both sets, and then all elements in the intersection between b and c. Overall, this should be much faster than the approaches discussed in the other answers (though I didn't do any benchmarks).

Note that it's important that the first line uses the non-shortcircuiting | operator rather than ||. The last operand, c.remove(x), obviously has a side effect, so we can't skip it. Otherwise, elements that are in all three sets would not be removed from c.

Upvotes: 2

Aleksander Krauze
Aleksander Krauze

Reputation: 6165

Collect directly into HashSet instead of Vec using impl FromIterator for HashSet.

That is:

let action_plus: HashSet<_> = action_minus.union(&action_new_line).collect();

As Chayim said in the comments you could also use | (BitOr) operator, since HashSet implements it and as the docs say it:

Returns the union of self and rhs as a new HashSet<T, S>

If you look at it's implementation you can see, that it does basically the same as the code above (although it clones items from those sets):

impl<T, S> BitOr<&HashSet<T, S>> for &HashSet<T, S>
where
    T: Eq + Hash + Clone,
    S: BuildHasher + Default,
{
    type Output = HashSet<T, S>;

    fn bitor(self, rhs: &HashSet<T, S>) -> HashSet<T, S> {
        self.union(rhs).cloned().collect()
    }
}

HasSet also implements BitAnd (&) operator for creating an intersection, Sub (-) for creating a set difference and BitXor (^) for creating a symmetric difference.

You can use those convenience operators, however they can be off-putting when one does not expect them, so use them carefully.

Upvotes: 5

drewtato
drewtato

Reputation: 12812

You can collect directly into a HashSet.

let a = get_random_set();
let b = get_random_set();
let c = get_random_set();
let a_minus_b_minus_c = a
    .difference(&b)
    .cloned()
    .collect::<HashSet<_>>()
    .difference(&c)
    .cloned()
    .collect::<HashSet<_>>();

cloned is necessary to convert the output from HashSet<&String> to HashSet<String>, since difference requires sets of the same type. You don't need the last cloned if you're okay with a_minus_b_minus_c being HashSet<&String>.

However, it's easier to just use the subtraction operator. HashSet has an implementation of Sub that takes both operands by reference, so you need to add a reference each time.

let a_minus_b_minus_c = &(&a - &b) - &c;

There's also:

  • BitAnd for intersection &a & &b
  • BitOr for union &a | &b
  • BitXor for symmetric difference &a ^ &b

Note that all these clone the values in the set and return a new set, so they're not optimally efficient. If you need more performance, you could do the operation manually by modifying the first set with retain (difference, intersection) or extend (union), or by manually making a new set that holds references.

Upvotes: 3

Related Questions