exocortex
exocortex

Reputation: 513

What is the idiomatic in Rust's nom way of turning (mapping) a parsing-error into an Ok-result?

I am not sure If I'm thinking wrong about the whole thing. Maybe there is a simpler solution.

In nom I want to parse C-style single line comments. Each line that I parse could theoretically contain a "// some comment" on the right side. I wrote a parser that can Parse these comments:

pub fn parse_single_line_comments(i: &str) -> IResult<&str, &str> {
    recognize(pair(tag("//"), is_not("\n\r")))(i)
}

It works in the case of a comment being present. But unfortunately if there is no comment it returns an error. Now I would like it to return an empty String instead (or later I could return an option, which would be more elegant). In my nom-learning I had this problem quite often - that I want to replace an error with a custom OK-variant. But I am never sure If I did it in the "right" way i.e. the idiomatic way of nom/rust. It always felt ugly as I was matching the return value of the parsing function. Think of it like this:

pub fn parse_single_line_comments(i: &str) -> IResult<&str, &str> {
    match recognize(pair(tag("//"), is_not("\n\r")))(i) {
        Ok((rest, comment)) => Ok((rest, comment)),
        _ => Ok((i, "")),
}

It looks kind of strange to me. There should be a better way to do this, right?

Upvotes: 0

Views: 252

Answers (1)

vallentin
vallentin

Reputation: 26215

You already hinted a bit at it yourself. You could use optional to parse zero-or-one line comments, or many0 to parse zero-to-many. Then combine that with preceded, and you can easily discard zero-to-many comments (and whitespace).

Let's consider a simple parse_ident to parse identifiers, that looks like this:

use nom::bytes::complete::take_while1;
use nom::{AsChar, IResult};

fn parse_ident(input: &str) -> IResult<&str, &str> {
    take_while1(|c: char| c.is_alpha() || (c == '_'))(input)
}

Now, again, let's say we want to skip zero-to-many whitespace and comments beforehand. First we can define our line comment parser (which you already did):

fn parse_single_line_comment(input: &str) -> IResult<&str, &str> {
    recognize(pair(tag("//"), is_not("\n\r")))(input)
}

Now we'll change parse_ident to use preceded and many0 to skip zero-to-many line comments. Additionally, we can also throw in multispace1 to skip zero-to-many whitespace as well:

use nom::branch::alt;
use nom::bytes::complete::{is_not, tag, take_while1};
use nom::character::complete::multispace1;
use nom::combinator::recognize;
use nom::multi::many0;
use nom::sequence::{pair, preceded};
use nom::{AsChar, IResult};

fn parse_ident(input: &str) -> IResult<&str, &str> {
    preceded(
        // Parsers to skip anything that is ignored
        many0(alt((
            parse_single_line_comment,
            multispace1,
        ))),
        // Identifier parsing
        take_while1(|c: char| c.is_alpha() || (c == '_')),
    )(input)
}

Which now allows us to successfully parse the following:

assert_eq!(
    parse_ident("identifier")
    Ok(("", "identifier"))
);
assert_eq!(
    parse_ident("     identifier"),
    Ok(("", "identifier"))
);
assert_eq!(
    parse_ident("// Comment\n  identifier"),
    Ok(("", "identifier"))
);
assert_eq!(
    parse_ident("// Comment\n// Comment\n  identifier"),
    Ok(("", "identifier"))
);

Depending on what you're parsing, then you'll need to sprinkle that preceded in various parsers. We can simplify the duplicate code a bit, by introducing our own skip_ignored parser:

fn skip_ignored<'a, F>(parser: F) -> impl FnMut(&'a str) -> IResult<&'a str, &'a str>
where
    F: FnMut(&'a str) -> IResult<&'a str, &'a str>,
{
    preceded(
        many0(alt((
            parse_single_line_comment,
            multispace1,
        ))),
        parser,
    )
}

fn parse_ident(input: &str) -> IResult<&str, &str> {
    skip_ignored(
        take_while1(|c: char| c.is_alpha() || (c == '_')),
    )(input)
}

Whether there's easier ways to do this highly depends on your data. But as long as you simply want to discard the whitespace and comments, then it's relatively straight-forward.


Since you actually asked about custom errors, then you can define your own enum as you otherwise would, and then impl ParseError:

use nom::error::{ErrorKind, ParseError};

#[derive(Debug)]
pub enum MyParseError<'a> {
    IdentTooLong,
    Nom(&'a str, ErrorKind),
}

impl<'a> ParseError<&'a str> for MyParseError<'a> {
    fn from_error_kind(input: &'a str, kind: ErrorKind) -> Self {
        Self::Nom(input, kind)
    }

    fn append(_: &'a str, _: ErrorKind, other: Self) -> Self {
        other
    }
}

Using it could look like this:

use nom::bytes::complete::take_while1;
use nom::{AsChar, IResult};

fn parse_ident<'a>(input: &'a str) -> IResult<&'a str, &'a str, MyParseError<'a>> {
    let (input, ident) = take_while1(|c: char| c.is_alpha() || (c == '_'))(input)?;

    // Return error if identifier is longer than 10 bytes
    if ident.len() > 10 {
        Err(nom::Err::Failure(MyParseError::IdentTooLong))
    } else {
        Ok((input, ident))
    }
}

fn main() {
    println!("{:?}", parse_ident(""));
    // Err(Error(Nom("", TakeWhile1)))   

    println!("{:?}", parse_ident("hello"));
    // Ok(("hello", "hello"))

    println!("{:?}", parse_ident("this_is_a_very_long_name"));
    // Err(Failure(IdentTooLong)) 
}

There's also FromExternalError, which works hand-in-hand with map_res. This is useful if say you want to call str::parse() and be able to easy map it into your MyParseError.

See also:

Upvotes: 1

Related Questions