HiDefender
HiDefender

Reputation: 2368

How can I modify properties in a HashSet that are not part of the hash computation?

I have a struct that contains a unique id and uses that id for its hash:

use std::borrow::Borrow;
use std::collections::HashSet;
use std::hash::{Hash, Hasher};

type Id = u32;

#[derive(Debug, Eq)]
struct Foo {
    id: Id,
    other_data: u32,
}

impl PartialEq for Foo {
    fn eq(&self, other: &Foo) -> bool {
        self.id == other.id
    }
}

impl Hash for Foo {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.id.hash(state);
    }
}

impl Borrow<Id> for Foo {
    fn borrow(&self) -> &Id {
        &self.id
    }
}

I understand that I cannot modify the value of Foo::id once I've put it into the HashSet because that would change the hash. However, I would like to modify Foo::other_data. I know I could remove it from the HashSet, modify it, and insert it again, but a method like get_mut() would be so much cleaner. Is there a way to accomplish something like this:

fn main() {
    let mut baz = HashSet::new();
    baz.insert(Foo {
        id: 1,
        other_data: 2,
    });

    if let Some(x) = baz.get_mut(&1) {
        *x = 3;
    }
}

Is this an anti-pattern; should I be using a HashMap instead?

Related to this question.

Upvotes: 2

Views: 1200

Answers (2)

HiDefender
HiDefender

Reputation: 2368

I believe unsafe code is the best route in this case.

impl Foo {
    fn set_other_data(set: &mut HashSet<Foo>, id: &Id, data: u32) -> bool{
        match set.get(id) {
            Some(x) => {
                let p: *const Foo = x;
                let q: *mut Foo = p as *mut Foo;
                unsafe {
                    (*q).other_data = data;
                }
                return true;
            }
            None => return false,
        }
    }
}

fn main() {
    let mut baz = HashSet::new();
    baz.insert(Foo {
        id: 1,
        other_data: 2,
    });

    Foo::set_other_data(&mut baz, &1, 3);
    assert_eq!(3, baz.get(&1).unwrap().other_data);
}

As Shepmaster quoted:

It is a logic error for an item to be modified in such a way that the item's hash, as determined by the Hash trait, or its equality, as determined by the Eq trait, changes while it is in the set. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code.

In this case other_data is not used by the Hash or Eq traits. So it can safely be mutated. The largest danger is that at a later point Hash for Foo or Eq for Foo would be modified and include other_data.

There is no danger of data races because the HashSet<Foo> is mutably borrowed.

Other options:

Decomposing: This works when Foo has only 2 elements, but suppose Foo contains many elements. Do you decompose Foo into all of its individual elements (seems messy) or create sub-structs within Foo (code bloat).

Encapsulation: Silvio Mayolo suggested encapsulating Foo in a HashSet like interface that internally uses HashMap. This keeps the API clean and uses only safe code, but seems like more programming then is necessary.

I would appreciate your feedback and if this seems reasonable I can put in a feature request for an unsafe fn get_mut() for HashSet.

Upvotes: -2

Shepmaster
Shepmaster

Reputation: 430554

This is not possible with your current data structure.

HashSet deliberately does not provide methods to mutate values. As you have alluded to, mutating a value in a HashSet (or the key in a HashMap) will invalidate the hash in the majority of cases. The API encourages proper usage and even makes mention of this:

It is a logic error for an item to be modified in such a way that the item's hash, as determined by the Hash trait, or its equality, as determined by the Eq trait, changes while it is in the set. This is normally only possible through Cell, RefCell, global state, I/O, or unsafe code.

This alludes to one way that you can solve your problem, by using interior mutability:

use std::cell::Cell;

#[derive(Debug, Eq)]
struct Foo {
    id: Id,
    other_data: Cell<u32>,
}
fn main() {
    let mut baz = HashSet::new();
    baz.insert(Foo {
        id: 1,
        other_data: Cell::new(2),
    });

    if let Some(x) = baz.get(&1) {
        x.other_data.set(3);
    }
}

This is a reasonable thing to do, but I wouldn't be thrilled about doing it. Instead, I would allow my type to be decomposed into a key and a value and store it in a HashMap, as mentioned. Something like


impl Foo {
    // or insert_into_hashmap(self, &mut HashMap<Id, u32>)
    fn into_key_value(self) -> (Id, u32) {
        (self.id, self.other_data)
    }

    // Maybe a
    //
    // fn from_key_value(&'a Id, &'a u32) -> Self
    // or
    // fn from_hashmap(Id, &HashMap<Id, u32>) -> Self
}

// Maybe a
//
// struct FooRef<'a> { (or FooRefMut?) 
//     id: &'a Id,
//     other_data: &'a u32,
// }
//
// With a
// fn from_key_value(&'a Id, &'a u32) -> Self
// or
// fn from_hashmap(Id, &HashMap<Id, u32>) -> Self

fn main() {
    let mut baz = HashMap::new();
    let f = Foo {
        id: 1,
        other_data: 2,
    };
    let (k, v) = f.into_key_value();
    baz.insert(k, v);

    // See also HashMap::get_key_value
    if let Some(v) = baz.get_mut(&1) {
        *v = 3;
    }
}

Upvotes: 11

Related Questions