Daniel Fillirini
Daniel Fillirini

Reputation: 1

how to solve borrowing problem for HashSet

I'm making a tokenizer and have encountered some difficulties. To better understand the problem, I’ll briefly describe the components of the tokenizer:

  1. Token. There are 3 types - Constant (fixed, these are keywords. We always know their size, text content, it is constant), Literal and Identifier (and it seems to be the problem because of it). Here is the implementation in the program:
#[derive(Clone, Copy)]
pub enum TokenType {
    Id,

    Algo, Struct, While,

    Int, Float,
}

pub trait Token {
    fn get_type(&self) -> TokenType;
    fn get_str(&self) -> &str; 
}

pub struct ConstToken {
    kind: TokenType
}
impl Token for ConstToken {
    fn get_type(&self) -> TokenType {
        self.kind
    }
    fn get_str(&self) -> &str {
        match self.kind {
            TokenType::Algo => "algo",
            TokenType::Struct => "struct",
            TokenType::While => "while",
            
            _ => panic!()
        }
    }
}
impl ConstToken {
    pub fn new(kind: TokenType) -> Self {
        ConstToken { kind }
    }
}

pub struct LitToken {
    kind: TokenType,
    txt: String
}
impl Token for LitToken {
    fn get_type(&self) -> TokenType {
        self.kind
    }
    fn get_str(&self) -> &str {
        &self.txt
    }
}
impl LitToken {
    pub fn new(kind: TokenType, txt: String) -> Self {
        LitToken { kind, txt }
    }
}

pub struct IdToken<'a> {
    txt: &'a str
}
impl<'a> Token for IdToken<'a> {
    fn get_type(&self) -> TokenType {
        TokenType::Id
    }
    fn get_str(&self) -> &str {
        self.txt
    }
}
impl<'a> IdToken<'a> {
    pub fn new(txt: &'a str) -> Self {
        IdToken { txt }
    }
}
  1. Tokenization function. I don't use regex, I work with a stream of characters using loops. Since I expect a large number of identical identifiers in the source, the identifier stores &str, a reference to a single copy of the identifier name. Here is the code implementing this behavior:
pub fn tokenize<'a>(src: &'a str, tokens: &mut Vec<Box<dyn Token + 'a>>, ids: &'a mut HashSet<String>)

...

loop {
    let (c, i) = match iter.next() {
        Some(res) => (res.1, res.0),
        None => {
            let str_id = &src[first..];
            if !ids.contains(str_id) {
                ids.insert(str_id.to_string()); // !ERROR
            }

            let token = IdToken::new(ids.get(str_id).unwrap());
            tokens.push(Box::new(token));
            break;
        }
    };

    if !c.is_alphabetic() {
        let str_id = &src[first..i];
            if !ids.contains(str_id) {
                ids.insert(str_id.to_string()); // !ERROR
            }

            let token = IdToken::new(ids.get(str_id).unwrap()); 
            tokens.push(Box::new(token));
            break;
    }
}

In the lines ids.insert(str_id.to_string()); the compiler complains:

cannot borrow *ids as mutable because it is also borrowed as immutable due to object lifetime defaults, Box<dyn Token> actually means Box<(dyn Token + 'static)>

I don’t really understand what the error is and how to solve it.

I would be very grateful for your help.

I tried creating a new HashSet and a vector of tokens inside the function and returning them instead of passing them as &mut parameters, but this did not help and only increased the number of errors.

in general, it is important for me to store the string representation of an identifier only once in the HashSet, and not copy it to the Token.

Upvotes: 0

Views: 75

Answers (0)

Related Questions