DownloadPizza
DownloadPizza

Reputation: 3466

Match string prefixes with different length

I am trying to tokenize boolean algebra, so I need to check if the start of a str matches a list of tokens (e.g. AND, OR, etc.). str.startsWith in an if else would obviously work, but it is an ugly solution, so I am wondering if theres a way to match an str like

match st {
  "OR", .. => ...
}

(obviously not real syntax)

Also: I can not rely on whitespaces or other characters following the String, ABxxxx should not match, while ORxxxxx should match.

Upvotes: 2

Views: 1165

Answers (1)

BurntSushi5
BurntSushi5

Reputation: 15354

I think if you only need to match a few strings, then an if ... else if tree or Martin's solution is probably best. Otherwise, I don't think there is a way to match string slice prefixes using match.

If, however, you have many strings you need to match, I'd recommend AhoCorasick, which will be algorithmically optimal. You can use the aho-corasick crate.

use aho_corasick::{AhoCorasick, AhoCorasickBuilder, MatchKind};

const PATTERNS: &[&str] = &["OR", "AND"];

fn op<T: AsRef<[u8]>>(ac: &AhoCorasick, v: T) -> Option<&'static str> {
    ac.find(v).map(|m| PATTERNS[m.pattern()])
}

fn main() {
    let ac = AhoCorasickBuilder::new()
        .auto_configure(PATTERNS)
        .anchored(true)
        .match_kind(MatchKind::LeftmostFirst)
        .build(PATTERNS);
    println!(
        "{:?} / {:?} / {:?}",
        op(&ac, "OR"),
        op(&ac, "AND"),
        op(&ac, "XOR")
    )
}

The trick to make it work for prefixes here is to set anchored(true). This will ensure that any match returned must be anchored to the beginning of the haystack.

The LeftmostFirst match kind setting may not be necessary, but it instructs the matcher to prefer earlier patterns over later patterns. e.g., if your patterns are Samwise and Sam, then LeftmostFirst will ensure that Samwise matches Samwise and not Sam. If none of your patterns have overlapping prefixes, then you can remove this setting.

Finally, the auto_configure option will enable some optimizations in the automaton, such as using a DFA instead of an NFA if the number of patterns is small enough.

See it also on the playground.

Upvotes: 3

Related Questions