ideasman42
ideasman42

Reputation: 48198

Are there equivalents to slice::chunks/windows for iterators to loop over pairs, triplets etc?

It can be useful to iterate over multiple variables at once, overlapping (slice::windows), or not (slice::chunks).

This only works for slices; is it possible to do this for iterators, using tuples for convenience?

Something like the following could be written:

for (prev, next) in some_iter.windows(2) {
    ...
}

If not, could it be implemented as a trait on existing iterators?

Upvotes: 57

Views: 47236

Answers (4)

Shepmaster
Shepmaster

Reputation: 431809

It's possible to take chunks of an iterator using Itertools::tuples, up to a 12-tuple:

use itertools::Itertools; // 0.10.0

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5, 6].into_iter();

    for (prev, next) in some_iter.tuples() {
        println!("{}--{}", prev, next);
    }
}

(playground)

1--2
3--4
5--6

If you don't know that your iterator exactly fits into the chunks, you can use Tuples::into_buffer to access any leftovers:

use itertools::Itertools; // 0.10.0

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5].into_iter();

    let mut t = some_iter.tuples();
    for (prev, next) in t.by_ref() {
        println!("{}--{}", prev, next);
    }
    for leftover in t.into_buffer() {
        println!("{}", leftover);
    }
}

(playground)

1--2
3--4
5

It's also possible to take up to 12-tuple windows with Itertools::tuple_windows:

use itertools::Itertools; // 0.10.0

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5, 6].into_iter();

    for (prev, next) in some_iter.tuple_windows() {
        println!("{}--{}", prev, next);
    }
}

(playground)

1--2
2--3
3--4
4--5
5--6

If you need to get partial chunks / windows, you can get

Upvotes: 51

Chayim Friedman
Chayim Friedman

Reputation: 71440

This is possible using the standard library only with arrays (not tuples but similar and even better convenience), using the Iterator::array_chunks() and Iterator::map_windows() functions. Note there isn't a general array_windows() function, because such function won't be zero-cost (it will have to clone the elements). Instead, you are provided with a reference to an array of elements, and if you want the behavior of array_windows() you can just clone it.

Upvotes: 0

Ross MacArthur
Ross MacArthur

Reputation: 5459

On stable

Since Rust 1.51 this is possible with const generics where the iterator yields constant size arrays [T; N] for any N.

I built the itermore crate which implements both of these, providing the array_chunks() and array_windows() methods under separate extension traits.

use itermore::IterArrayChunks; // 0.7

for [a, b, c] in some_iter.by_ref().array_chunks() {
    ...
}

let rem = some_iter.into_remainder();
use itermore::IterArrayWindows; // 0.7

for [prev, next] in some_iter.array_windows() {
    ...
}

Using the example given in the Itertools answer:

use itermore::IterArrayChunks; // 0.7

fn main() {
    let some_iter = vec![1, 2, 3, 4, 5, 6].into_iter();

    for [prev, next] in some_iter.array_chunks() {
        println!("{}--{}", prev, next);
    }
}

This outputs

1--2
3--4
5--6

Most times the array size can be inferred but you can also specific it explicitly. Additionally, any reasonable size N can be used, there is no limit like in the Itertools case.

use itermore::IterArrayWindows; // 0.7

fn main() {
    let mut iter = vec![1, 2, 3, 4, 5, 6].into_iter().array_windows::<5>();
    println!("{:?}", iter.next());
    println!("{:?}", iter.next());
    println!("{:?}", iter.next());
}

This outputs

Some([1, 2, 3, 4, 5])
Some([2, 3, 4, 5, 6])
None

Note: array_windows() uses clone to yield elements multiple times so its best used for references and cheap to copy types.

On nightly

The chunks version is now available on nightly under the name array_chunks

#![feature(iter_array_chunks)]

for [a, b, c] in some_iter.array_chunks() {
    ...
}

And it handles remainders nicely:

#![feature(iter_array_chunks)]

for [a, b, c] in some_iter.by_ref().array_chunks() {
    ...
}

let rem = some_iter.into_remainder();

Performance

The iterator versions are particularly useful if you are not iterating over some contiguous collection. However, depending on your use-case you might find that collecting into a Vec first and using the slice methods might be faster even including the time allocate the iterator into a Vec. This is particularly true in the case of array_windows where the elements need to be cloned.

Upvotes: 20

Matthieu M.
Matthieu M.

Reputation: 300209

TL;DR: The best way to have chunks and windows on an arbitrary iterator/collection is to first collect it into a Vec and iterate over that.


The exact syntax requested is impossible in Rust.

The issue is that in Rust, a function's signature is depending on types, not values, and while Dependent Typing exists, there are few languages that implement it (it's hard).

This is why chunks and windows return a sub-slice by the way; the number of elements in a &[T] is not part of the type and therefore can be decided at run-time.


Let's pretend you asked for: for slice in some_iter.windows(2) instead then.

Where would the storage backing this slice live?

It cannot live:

  • in the original collection because a LinkedList doesn't have a contiguous storage
  • in the iterator because of the definition of Iterator::Item, there is no lifetime available

So, unfortunately, slices can only be used when the backing storage is a slice.


If dynamic allocations are accepted, then it is possible to use Vec<Iterator::Item> as the Item of the chunking iterator.

struct Chunks<I: Iterator> {
    elements: Vec<<I as Iterator>::Item>,
    underlying: I,
}

impl<I: Iterator> Chunks<I> {
    fn new(iterator: I, size: usize) -> Chunks<I> {
        assert!(size > 0);

        let mut result = Chunks {
           underlying: iterator, elements: Vec::with_capacity(size)
        };
        result.refill(size);
        result
    }

    fn refill(&mut self, size: usize) {
        assert!(self.elements.is_empty());

        for _ in 0..size {
            match self.underlying.next() {
                Some(item) => self.elements.push(item),
                None => break,
            }
        }
    }
}

impl<I: Iterator> Iterator for Chunks<I> {
    type Item = Vec<<I as Iterator>::Item>;

    fn next(&mut self) -> Option<Self::Item> {
        if self.elements.is_empty() {
            return None;
        }

        let new_elements = Vec::with_capacity(self.elements.len());
        let result = std::mem::replace(&mut self.elements, new_elements);

        self.refill(result.len());

        Some(result)
    }
}

fn main() {
    let v = vec!(1, 2, 3, 4, 5);

    for slice in Chunks::new(v.iter(), 2) {
        println!("{:?}", slice);
    }
}

Will return:

[1, 2]
[3, 4]
[5]

The canny reader will realize that I surreptitiously switched from windows to chunks.

windows is more difficult, because it returns the same element multiple times which require that the element be Clone. Also, since it needs returning a full Vec each time, it will need internally to keep a Vec<Vec<Iterator::Item>>.

This is left as an exercise to the reader.


Finally, a note on performance: all those allocations are gonna hurt (especially in the windows case).

The best allocation strategy is generally to allocate a single chunk of memory and then live off that (unless the amount is really massive, in which case streaming is required).

It's called collect::<Vec<_>>() in Rust.

And since the Vec has a chunks and windows methods (by virtue of implementing Deref<Target=[T]>), you can then use that instead:

for slice in v.iter().collect::<Vec<_>>().chunks(2) {
    println!("{:?}", slice);
}

for slice in v.iter().collect::<Vec<_>>().windows(2) {
    println!("{:?}", slice);
}

Sometimes the best solutions are the simplest.

Upvotes: 36

Related Questions