user17004502
user17004502

Reputation: 115

How to flatten iterator that uses references

I am using the memmap2 crate to read some large binary files, and I am using the midasio library which provides some "viewer" structures that just reference inner structures in the byte slice.

From a slice of bytes (the memory map) I can create a FileView, with which I can iterate over EventViews, with which I can iterate over BankViews. All these just reference to the underlying memory mapped slice.

It usually is trivial to iterate through the BankViews in a set of files as:

Minimal working example:

Cargo.toml

[dependencies]
midasio = "0.3"
memmap2 = "0.5"

and main.rs

use std::path::PathBuf;
use std::fs::File;
use memmap2::Mmap;
use midasio::read::file::FileView;

fn main() {
    let args: Vec<PathBuf> = Vec::new(); // Just the name of some files
    for path in args {
        let file = File::open(path).unwrap();
        let mmap = unsafe { Mmap::map(&file).unwrap() };

        let file_view = FileView::try_from(&mmap[..]).unwrap();
        for event_view in &file_view {
            for _bank_view in &event_view {
                // Here I am just iterating through all the BankViews
            }
        }
    }
}

I need to "flatten" all these into a single iterator such that whenever I call next() it has the exact same behavior as the nested loop above. How can I do this?

I need to do it because I want to use the Cursive library and loop through BankViews by pressing a "next" button. So I need to control each "next" with a single function that, hopefully, just calls next on the massive iterator.

I tried

use std::path::PathBuf;
use std::fs::File;
use memmap2::Mmap;
use midasio::read::file::FileView;

fn main() {
    let args: Vec<PathBuf> = Vec::new();
    let iterator = args
        .iter()
        .map(|path| {
            let file = File::open(path).unwrap();
            let mmap = unsafe { Mmap::map(&file).unwrap() };

            FileView::try_from(&mmap[..]).unwrap()
        })
        .flat_map(|file_view| file_view.into_iter())
        .flat_map(|event_view| event_view.into_iter());
}

And this gives me the errors:

error[E0515]: cannot return value referencing local variable `mmap`
  --> src/main.rs:14:13
   |
14 |             FileView::try_from(&mmap[..]).unwrap()
   |             ^^^^^^^^^^^^^^^^^^^^----^^^^^^^^^^^^^^
   |             |                   |
   |             |                   `mmap` is borrowed here
   |             returns a value referencing data owned by the current function

error[E0515]: cannot return reference to function parameter `file_view`
  --> src/main.rs:16:31
   |
16 |         .flat_map(|file_view| file_view.into_iter())
   |                               ^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function

error[E0515]: cannot return reference to function parameter `event_view`
  --> src/main.rs:17:32
   |
17 |         .flat_map(|event_view| event_view.into_iter());
   |                                ^^^^^^^^^^^^^^^^^^^^^^ returns a reference to data owned by the current function

For more information about this error, try `rustc --explain E0515`.
error: could not compile `ugly_iteration` due to 3 previous errors

Upvotes: 4

Views: 691

Answers (1)

Chayim Friedman
Chayim Friedman

Reputation: 71350

This is problematic. Because the IntoIterator impls borrow self you need to hold both the iterable and the iterator together, and that creates a self-referential struct. See Why can't I store a value and a reference to that value in the same struct?.

It looks to me, even though I haven't digged deep, that this is not necessary and this is actually the result of a wrong design of midasio. But you can't do much regarding that, other than patching the library or sending a PR and hoping for it to be accepted soon (if you want to change that, I think it is enough to change the &'a FileView<'a> and &'a EventView<'a> to &'_ FileView<'a> and &'_ EventView<'a> respectively, though I'm unsure).https://github.com/DJDuque/midasio/pull/8

I don't think there is a good solution. Using iterator adapters is unlikely to work, and creating your own iterator type will require unsafe code or at the very least using a crate like ouroboros.


Edit: With my PR #8, it still doesn't work verbatim because the Mmap is dropped at the end of the map() but you still need to access it, however this is fixable pretty easily by collecting all Mmaps into a Vec:

fn main() {
    let args: Vec<PathBuf> = Vec::new();
    let mmaps = args
        .iter()
        .map(|path| {
            let file = File::open(path).unwrap();
            unsafe { Mmap::map(&file).unwrap() }
            
        })
        .collect::<Vec<_>>();
    let iterator = mmaps
        .iter()
        .map(|mmap| FileView::try_from(&mmap[..]).unwrap())
        .flat_map(|file_view| file_view.into_iter())
        .flat_map(|event_view| event_view.into_iter());
}

Returning this iterator from a function is still not going to work, unfortunately.

Upvotes: 2

Related Questions