Anunaki
Anunaki

Reputation: 881

Is there a simple way remove duplicate elements from an array?

I want to remove duplicate elements from an array:

use itertools::Itertools;
use std::collections::HashSet;

#[derive(Debug)]
struct Person {
    name: String,
    age: u32,
}

fn main() {
    let arr = [
        Person { name: "aaa".to_string(), age: 10 },
        Person { name: "bbb".to_string(), age: 20 },
        Person { name: "bbb".to_string(), age: 20 },
        Person { name: "ccc".to_string(), age: 30 },
    ];

    // Way 1:
    let arr2 = {
        let names: Vec<_> = arr.iter().map(|v| v.name.clone()).unique().collect();
        names
            .iter()
            .map(|name| arr.iter().find(|person| &person.name == name).unwrap())
            .collect::<Vec<_>>()
    };
    dbg!(arr2);

    // Way 2:
    let arr2 = {
        let mut names = HashSet::new();
        arr.iter()
            .filter(|p| names.insert(p.name.clone()))
            .collect::<Vec<_>>()
    };
    dbg!(arr2);

    /*
    expect:
        [
            Person{name: "aaa".to_string(), age: 10},
            Person{name: "bbb".to_string(), age: 20},
            Person{name: "ccc".to_string(), age: 30},
        ]
    */
}

Way 2 is simple compared to way 1, but is there anything simpler?

Upvotes: 9

Views: 25172

Answers (7)

geckos
geckos

Reputation: 6299

My solution consuming the vector

    let foo = vec![1, 2, 3, 1, 4, 1, 5];
    eprintln!(
        "{:?}",
        foo.into_iter()
            .collect::<std::collections::HashSet<u32>>()
            .into_iter()
            .collect::<Vec<u32>>()
    );

The problem is that the output is not ordered as it was before

Upvotes: 3

Claudio Fsr
Claudio Fsr

Reputation: 444

This method operates in place, visiting each element exactly once in the original order, and preserves the order of the retained elements.

This works because retain only keeps items for which the predicate returns true, and insert only returns true if the item was not previously present in the set.

Since the vector is traversed in order, we end up keeping just the first occurrence of each item.

Fonts:

remove-duplicates-while-preserving-order-in-rust

how-to-implement-a-trait-for-a-vector-of-generic-type-vect

use std::collections::HashSet;

pub trait Unique<T> {
    fn unique(&mut self);
}

impl<T> Unique<T> for Vec<T>
where
    T: Clone + Eq + std::hash::Hash,
{
    fn unique(self: &mut Vec<T>) {
        let mut seen: HashSet<T> = HashSet::new();
        self.retain(|item| seen.insert(item.clone()));
    }
}

fn main() {
    let mut vector = vec![1, 4, 3, 4, 5, 4, 3, 4, 2, 3];
    println!("vector: {:?}", vector);

    vector.unique();
    println!("vector: {:?}", vector);

    assert_eq!(vector, [1, 4, 3, 5, 2]);
}

The output:

vector: [1, 4, 3, 4, 5, 4, 3, 4, 2, 3]
vector: [1, 4, 3, 5, 2]

The Rust Playground.

Upvotes: 2

tenxsoydev
tenxsoydev

Reputation: 503

I would use a Hashmap to get an overview of existing values to retain. This is a slim approach that doesn't require external dependencies.

let mut arr = vec![1, 2, 2, 3, 3, 3, 4, 5, 5];
let mut seen: HashMap<i32, bool> = HashMap::new();

// Iterate over the array and mark values as "seen"
arr.retain(|&x| seen.insert(x, true).is_none());

println!("{:?}", arr); // -> [1, 2, 3, 4, 5]

playground

If your array consists of struct or enum values you'll need to add derive(Hash).

It's actually this platform that taught me that approach quite a while ago when working with Lua tables. Unfortunately, I can't find the related article.

Upvotes: 2

Kaplan
Kaplan

Reputation: 3758

This is a simple way to remove duplicates if you don't mind, that the original insertion order gets lost. The resulting array is sorted:

let unique = std::collections::BTreeSet::from(arr);
let arr = unique.into_iter().collect::<Vec<_>>();
println!("{arr:?}");

Playground

Upvotes: 0

Shepmaster
Shepmaster

Reputation: 431479

With nightly Rust, this can be done in-place and with no additional memory allocation via slice::partition_dedup_by:

#![feature(slice_partition_dedup)]

#[derive(Debug)]
struct Person {
    name: String,
    age: u32,
}

fn main() {
    let mut arr = [
        Person { name: "aaa".to_string(), age: 10 },
        Person { name: "bbb".to_string(), age: 20 },
        Person { name: "bbb".to_string(), age: 20 },
        Person { name: "ccc".to_string(), age: 30 },
    ];

    arr.sort_by(|a, b| a.name.cmp(&b.name));
    let (unique, _) = arr.partition_dedup_by(|a, b| a.name == b.name);
    
    dbg!(unique);
}

This doesn't actually remove the duplicate elements, it just moves them to the end of the original slice / array / Vec. It's impossible to remove values from an array because an array has a fixed length.

See also:

Upvotes: 2

Jason
Jason

Reputation: 5565

There's a difference between the dedup and unique method in Itertools, where the former operates on contiguous elements, i.e:

[1, 2, 2, 3, 4, 3, 2, 1].iter().dedup()  // [1, 2, 3, 4, 3, 2, 1]
[1, 2, 2, 3, 4, 3, 2, 1].iter().unique() // [1, 2, 3, 4]

If you're looking to have unique elements by name, unique_by might do:

use itertools::Itertools;

#[derive(Debug)]
struct Person {
    name: String,
    age: u32,
}

fn main() {
    let arr = [
        Person { name: "aaa".to_string(), age: 10 },
        Person { name: "bbb".to_string(), age: 20 },
        Person { name: "bbb".to_string(), age: 20 }, // Duplicate
        Person { name: "ccc".to_string(), age: 30 },
    ];

    let res = arr.iter().unique_by(|p| &p.name).collect::<Vec<_>>();
}
[
  Person { name: "aaa", age: 10 },
  Person { name: "bbb", age: 20 },
  Person { name: "ccc", age: 30 }
]

Upvotes: 20

phimuemue
phimuemue

Reputation: 36031

Maybe Itertools::unique and Itertools::unique_by help. They use a Hash-based approach.

Upvotes: 2

Related Questions