eltiare
eltiare

Reputation: 1939

Iterating over named regex groups in Rust

I wish to extract all named groups from a match into a HashMap and I'm running into a "does not live long enough" error while trying to compile this code:

extern crate regex;

use std::collections::HashMap;
use regex::Regex;

pub struct Route {
    regex: Regex,
}

pub struct Router<'a> {
    pub namespace_seperator: &'a str,
    routes: Vec<Route>,
}

impl<'a> Router<'a> {
    // ...

    pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
        for route in &self.routes {
            if route.regex.is_match(path) {
                let mut hash = HashMap::new();
                for cap in route.regex.captures_iter(path) {
                    for (name, value) in cap.iter_named() {
                        hash.insert(name, value.unwrap());
                    }
                }
                return Some(hash);
            }
        }
        None
    }
}

fn main() {}

Here's the error output:

error: `cap` does not live long enough
  --> src/main.rs:23:42
   |>
23 |>                     for (name, value) in cap.iter_named() {
   |>                                          ^^^
note: reference must be valid for the anonymous lifetime #1 defined on the block at 18:79...
  --> src/main.rs:18:80
   |>
18 |>     pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
   |>                                                                                ^
note: ...but borrowed value is only valid for the for at 22:16
  --> src/main.rs:22:17
   |>
22 |>                 for cap in route.regex.captures_iter(path) {
   |>                 ^

Obviously I still have a thing or two to learn about Rust lifetimes.

Upvotes: 4

Views: 2130

Answers (2)

Shepmaster
Shepmaster

Reputation: 431669

Matthieu M. already explained the lifetime situation well. The good news is that the regex crate recognized the problem and there's a fix in the pipeline for 1.0.

As stated in the commit message:

It was always possible to work around this by using indices.

It is also possible to work around this by using Regex::capture_names, although it's a bit more nested this way:

pub fn path_to_params(&self, path: &'a str) -> Option<HashMap<&str, &str>> {
    for route in &self.routes {
        if let Some(captures) = route.regex.captures(path) {
            let mut hash = HashMap::new();
            for name in route.regex.capture_names() {
                if let Some(name) = name {
                    if let Some(value) = captures.name(name) {
                        hash.insert(name, value);
                    }
                }
            }
            return Some(hash);
        }
    }
    None
}

Note that I also removed the outer is_match — it's inefficient to run the regex once and then again.

Upvotes: 3

Matthieu M.
Matthieu M.

Reputation: 300129

Let's follow the lifetime lines:

  • route.regex.captures_iter(path) creates a FindCapture<'r, 't> where the lifetime 'r is that of route.regex and the lifetime 't is that of path
  • this iterator yields a Captures<'t>, only linked to the lifetime of path
  • whose method iter_named(&'t self) yields a SubCapture<'t> itself linked to the lifetime of path and the lifetime of the cap
  • this iterator yields a (&'t str, Option<&'t str>) so that both keys and values of the HashMap are linked to the lifetime of path and the lifetime of the cap

Therefore, it is unfortunately impossible to have the HashMap outlive the cap variable as this variable is used by the code as a "marker" to keep the buffers containing the groups alive.

I am afraid that the only solution without significant re-structuring is to return a HashMap<String, String>, as unsatisfying as it is. It also occurs to me that a single capture group may match multiple times, not sure if you want to bother with this.

Upvotes: 3

Related Questions