createproblem
createproblem

Reputation: 1612

How to escape escaped regex characters when using Rust's regex crate?

I have a regex that has many " and \ escaped characters. I tested my regex in general and you can find my working live demo. I transferred the regex to Rust. Here is a simplified example that does not work:

extern crate regex; // 1.1.0
use regex::Regex;

fn main() {
    let re = Regex::new(r#"123 \\""(\w+)"#).unwrap();
    let test = "123 \"PROPFIND\"";

    for cap in re.captures_iter(test) {
        println!("{}", &cap[1]);
    }
}

Playground

The output of my example is empty, but I expect PROPFIND

The regex documentation pointed me to the raw string documentation. I played around with different escaping techniques, but can't figure out where I messed up.

Upvotes: 7

Views: 6707

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627219

Your original pattern needs to be written as

let re = Regex::new(r#"(\d{1,3}(?:\.\d{1,3}){3}) (\w+|-) (\w+|-) \[(.*?)\] "(\w+) (.*?) (HTTPS?)/([0-9]\.[0-9])" ([0-9]+) ([0-9]+) "(\w+|-)" "(.*?)""#).unwrap();

The current one as:

let re = Regex::new(r#"123 "(\w+)""#).unwrap();

In short, all \\"" in your pattern should look like ". And make sure the pattern is inside r#" and "#.

Please refer to the Rust raw string literals reference:

Raw string literals do not process any escapes. They start with the character U+0072 (r), followed by zero or more of the character U+0023 (#) and a U+0022 (double-quote) character. The raw string body can contain any sequence of Unicode characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.

Upvotes: 12

Related Questions