Reputation: 881
I want to remove duplicate elements from an array:
use itertools::Itertools;
use std::collections::HashSet;
#[derive(Debug)]
struct Person {
name: String,
age: u32,
}
fn main() {
let arr = [
Person { name: "aaa".to_string(), age: 10 },
Person { name: "bbb".to_string(), age: 20 },
Person { name: "bbb".to_string(), age: 20 },
Person { name: "ccc".to_string(), age: 30 },
];
// Way 1:
let arr2 = {
let names: Vec<_> = arr.iter().map(|v| v.name.clone()).unique().collect();
names
.iter()
.map(|name| arr.iter().find(|person| &person.name == name).unwrap())
.collect::<Vec<_>>()
};
dbg!(arr2);
// Way 2:
let arr2 = {
let mut names = HashSet::new();
arr.iter()
.filter(|p| names.insert(p.name.clone()))
.collect::<Vec<_>>()
};
dbg!(arr2);
/*
expect:
[
Person{name: "aaa".to_string(), age: 10},
Person{name: "bbb".to_string(), age: 20},
Person{name: "ccc".to_string(), age: 30},
]
*/
}
Way 2 is simple compared to way 1, but is there anything simpler?
Upvotes: 9
Views: 25172
Reputation: 6299
My solution consuming the vector
let foo = vec![1, 2, 3, 1, 4, 1, 5];
eprintln!(
"{:?}",
foo.into_iter()
.collect::<std::collections::HashSet<u32>>()
.into_iter()
.collect::<Vec<u32>>()
);
The problem is that the output is not ordered as it was before
Upvotes: 3
Reputation: 444
This method operates in place, visiting each element exactly once in the original order, and preserves the order of the retained elements.
This works because retain only keeps items for which the predicate returns true, and insert only returns true if the item was not previously present in the set.
Since the vector is traversed in order, we end up keeping just the first occurrence of each item.
Fonts:
remove-duplicates-while-preserving-order-in-rust
how-to-implement-a-trait-for-a-vector-of-generic-type-vect
use std::collections::HashSet;
pub trait Unique<T> {
fn unique(&mut self);
}
impl<T> Unique<T> for Vec<T>
where
T: Clone + Eq + std::hash::Hash,
{
fn unique(self: &mut Vec<T>) {
let mut seen: HashSet<T> = HashSet::new();
self.retain(|item| seen.insert(item.clone()));
}
}
fn main() {
let mut vector = vec![1, 4, 3, 4, 5, 4, 3, 4, 2, 3];
println!("vector: {:?}", vector);
vector.unique();
println!("vector: {:?}", vector);
assert_eq!(vector, [1, 4, 3, 5, 2]);
}
The output:
vector: [1, 4, 3, 4, 5, 4, 3, 4, 2, 3]
vector: [1, 4, 3, 5, 2]
The Rust Playground.
Upvotes: 2
Reputation: 503
I would use a Hashmap to get an overview of existing values to retain. This is a slim approach that doesn't require external dependencies.
let mut arr = vec![1, 2, 2, 3, 3, 3, 4, 5, 5];
let mut seen: HashMap<i32, bool> = HashMap::new();
// Iterate over the array and mark values as "seen"
arr.retain(|&x| seen.insert(x, true).is_none());
println!("{:?}", arr); // -> [1, 2, 3, 4, 5]
If your array consists of struct or enum values you'll need to add derive(Hash)
.
It's actually this platform that taught me that approach quite a while ago when working with Lua tables. Unfortunately, I can't find the related article.
Upvotes: 2
Reputation: 3758
This is a simple way to remove duplicates if you don't mind, that the original insertion order gets lost. The resulting array is sorted:
let unique = std::collections::BTreeSet::from(arr);
let arr = unique.into_iter().collect::<Vec<_>>();
println!("{arr:?}");
Upvotes: 0
Reputation: 431479
With nightly Rust, this can be done in-place and with no additional memory allocation via slice::partition_dedup_by
:
#![feature(slice_partition_dedup)]
#[derive(Debug)]
struct Person {
name: String,
age: u32,
}
fn main() {
let mut arr = [
Person { name: "aaa".to_string(), age: 10 },
Person { name: "bbb".to_string(), age: 20 },
Person { name: "bbb".to_string(), age: 20 },
Person { name: "ccc".to_string(), age: 30 },
];
arr.sort_by(|a, b| a.name.cmp(&b.name));
let (unique, _) = arr.partition_dedup_by(|a, b| a.name == b.name);
dbg!(unique);
}
This doesn't actually remove the duplicate elements, it just moves them to the end of the original slice / array / Vec
. It's impossible to remove values from an array because an array has a fixed length.
See also:
Upvotes: 2
Reputation: 5565
There's a difference between the dedup
and unique
method in Itertools
, where the former operates on contiguous elements, i.e:
[1, 2, 2, 3, 4, 3, 2, 1].iter().dedup() // [1, 2, 3, 4, 3, 2, 1]
[1, 2, 2, 3, 4, 3, 2, 1].iter().unique() // [1, 2, 3, 4]
If you're looking to have unique elements by name
, unique_by
might do:
use itertools::Itertools;
#[derive(Debug)]
struct Person {
name: String,
age: u32,
}
fn main() {
let arr = [
Person { name: "aaa".to_string(), age: 10 },
Person { name: "bbb".to_string(), age: 20 },
Person { name: "bbb".to_string(), age: 20 }, // Duplicate
Person { name: "ccc".to_string(), age: 30 },
];
let res = arr.iter().unique_by(|p| &p.name).collect::<Vec<_>>();
}
[
Person { name: "aaa", age: 10 },
Person { name: "bbb", age: 20 },
Person { name: "ccc", age: 30 }
]
Upvotes: 20
Reputation: 36031
Maybe Itertools::unique
and Itertools::unique_by
help. They use a Hash
-based approach.
Upvotes: 2