Brogolem35
Brogolem35

Reputation: 151

Bumpalo Arena Allocator - Vector Lifetime Issue

I am trying to implement a fast Markov Chain made specifically for text generation. Every state of the chain holds a ChainItem which is just a wrapper for Vec. I want to use an arena allocator to hold these ChainItems to avoid the drop of thousands of ChainItems and faster allocation. I decided to use bumpalo and things are not going so well.

markov.rs:

/// Represents a Markov Chain that is designed to generate text.
///
/// [Wikipedia](https://en.wikipedia.org/wiki/Markov_chain)
pub struct MarkovChain<'a> {
    pub items: HashMap<String, ChainItem<'a>>,
    pub state_size: usize,
    regex: &'a Regex,
    arena: Bump,
}

impl<'a> MarkovChain<'a> {
    /// Create an empty MarkovChain.
    ///
    /// The hash map of the MarkovChain is initially created with a capacity of 0, so it will not allocate until it
    /// is first inserted into.
    #[allow(dead_code)]
    pub fn new(state_size: usize, regex: &Regex) -> MarkovChain {
        MarkovChain {
            items: HashMap::<String, ChainItem>::new(),
            state_size,
            regex,
            arena: Bump::new(),
        }
    }

    /// Create an empty `HashMap` with the specified capacity.
    ///
    /// The hash map of the MarkovChain will be able to hold at least `capacity` elements without
    /// reallocating. If `capacity` is 0, the hash map will not allocate.
    pub fn with_capacity(state_size: usize, capacity: usize, regex: &Regex) -> MarkovChain {
        MarkovChain {
            items: HashMap::with_capacity(capacity),
            state_size,
            regex,
            arena: Bump::new(),
        }
    }

    /// Add text as training data.
    pub fn add_text(&'a mut self, text: &str) {
        let tokens = self.regex.find_iter(text);

        let mut prev: VecDeque<&str> = VecDeque::with_capacity(self.state_size + 100);
        let mut prev_buf: String = String::with_capacity(255);
        for t in tokens {
            for i in 1..=prev.len() {
                prev_buf.clear();
                for (i, s) in prev.iter().rev().take(i).rev().enumerate() {
                    if i > 0 {
                        prev_buf.push(' ')
                    }

                    prev_buf.push_str(s);
                }

                let t = ustr(t.as_str());

                if let Some(ci) = self.items.get_mut(&prev_buf) {
                    ci.add(t);
                } else {
                    self.items.insert(prev_buf.clone(), ChainItem::new(t, &self.arena));
                }
            }

            if prev.len() == self.state_size {
                prev.pop_front();
            }
            prev.push_back(t.as_str());
        }
    }

// Bla bla
}

/// Wrapper for Vec<Ustr> to make some operations easier.
pub struct ChainItem<'a> {
    pub items: bumpalo::collections::Vec<'a, Ustr>,
}

impl<'a> ChainItem<'a> {
    /// Create a ChainItem, which will also contain `s`.
    pub fn new(s: Ustr, bump: &'a Bump) -> ChainItem<'a> {
        ChainItem { items: bumpalo::vec![in bump;s] }
    }

    /// Add item.
    pub fn add(&mut self, s: Ustr) {
        self.items.push(s);
    }

    /// Get a random item.
    pub fn get_rand(&self) -> Ustr {
        *self.items
            // get a random item from the Vec
            .choose(&mut rand::thread_rng())
            .unwrap()
    }
}

When I try to do this on main

    // Reads every file into a string
    let contents = files.filter_map(|f| read_to_string(f.path()).ok());

    // Creating the Markov Chain
    let markov_chain = contents.fold(
        MarkovChain::with_capacity(2, 8_000_000, &WORD_REGEX),
        |mut a, s| {
            a.add_text(&s);
            a
        },
    );

I get this:

error[E0597]: `a` does not live long enough
  --> src/main.rs:39:4
   |
38 |         |mut a, s| {
   |          -----
   |          |
   |          binding `a` declared here
   |          has type `markov::MarkovChain<'1>`
39 |             a.add_text(&s);
   |             ^-------------
   |             |
   |             borrowed value does not live long enough
   |             argument requires that `a` is borrowed for `'1`
40 |             a
41 |         },
   |         - `a` dropped here while still borrowed

error[E0505]: cannot move out of `a` because it is borrowed
  --> src/main.rs:40:4
   |
38 |         |mut a, s| {
   |          -----
   |          |
   |          binding `a` declared here
   |          has type `markov::MarkovChain<'1>`
39 |             a.add_text(&s);
   |             --------------
   |             |
   |             borrow of `a` occurs here
   |             argument requires that `a` is borrowed for `'1`
40 |             a
   |             ^ move out of `a` occurs here

Some errors have detailed explanations: E0505, E0597.
For more information about an error, try `rustc --explain E0505`.
error: could not compile `markov-chain` (bin "markov-chain") due to 2 previous errors

Cargo.toml:

[package]
name = "markov-chain"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
hashbrown = "0.14.5"
once_cell = "1.19.0"
rand = "0.8.5"
regex = "1.10.4"
ustr = "1.0.0"
bumpalo = { version = "3.16.0", features = ["collections"] }

[target.'cfg(not(target_env = "msvc"))'.dependencies]
tikv-jemallocator = "0.5"

[package.metadata.clippy]
# This sets all pedantic lints to be treated as errors
deny = ["clippy::pedantic"]

Am I doing somethimng wrong or is this a limitation of arenas or bumpalo? Also, I am open for suggestions regarding which arena to use if some other arena allocator is more suited for this usecase.

Upvotes: 0

Views: 38

Answers (0)

Related Questions