mottosson
mottosson

Reputation: 3763

Parse string with escaped single quotes

I want to parse a string containing ASCII characters between single quotes and that can contain escaped single quotes by two ' in a row.

'string value contained between single quotes -> '' and so on...'

which should result in:

string value contained between single quotes -> ' and so on...

use nom::{
    bytes::complete::{tag, take_while},
    error::{ErrorKind, ParseError},
    sequence::delimited,
    IResult,
};

fn main() {
    let res = string_value::<(&str, ErrorKind)>("'abc''def'");

    assert_eq!(res, Ok(("", "abc\'def")));
}

pub fn is_ascii_char(chr: char) -> bool {
    chr.is_ascii()
}

fn string_value<'a, E: ParseError<&'a str>>(i: &'a str) -> IResult<&'a str, &'a str, E> {
    delimited(tag("'"), take_while(is_ascii_char), tag("'"))(i)
}

How can I detect escaped quotes and not the end of the string?

Upvotes: 3

Views: 1706

Answers (2)

Just a learner
Just a learner

Reputation: 28572

I'm learning nom and below is my trying.

let a = r###"'string value contained between single quotes -> '' and so on...'"###;

fn parser(input: &str) -> IResult<&str, &str> {
    let len = input.chars().count() - 2;
    delimited(tag("'"), take(len), tag("'"))(input)
}

let (remaining, mut matched) = parser(a).unwrap_or_default();

let sss = matched.replace("''", "'");
matched = &sss;
println!("remaining: {:#?}", remaining);
println!("matched: {:#?}", matched);

It prints this result:

remaining: ""
matched: "string value contained between single quotes -> ' and so on..."

My testing is based on nom 6.2.1.

Upvotes: 0

edwardw
edwardw

Reputation: 13942

This is pretty tricky, but the following works:

//# nom = "5.0.1"
use nom::{
    bytes::complete::{escaped_transform, tag},
    character::complete::none_of,
    combinator::{recognize, map_parser},
    multi::{many0, separated_list},
    sequence::delimited,
    IResult,
};

fn main() {
    let (_, res) = parse_quoted("'abc''def'").unwrap();
    assert_eq!(res, "abc'def");
    let (_, res) = parse_quoted("'xy@$%!z'").unwrap();
    assert_eq!(res, "xy@$%!z");
    let (_, res) = parse_quoted("'single quotes -> '' and so on...'").unwrap();
    assert_eq!(res, "single quotes -> ' and so on...");
}

fn parse_quoted(input: &str) -> IResult<&str, String> {
    let seq = recognize(separated_list(tag("''"), many0(none_of("'"))));
    let unquote = escaped_transform(none_of("'"), '\'', tag("'"));
    let res = delimited(tag("'"), map_parser(seq, unquote), tag("'"))(input)?;

    Ok(res)
}

Some explanations:

  1. the parser seq recognizes any sequence that alternates between double quotes and anything else;
  2. unquote transforms any double quotes into single one;
  3. map_parser then combines the two together to produce the desired result.

Be aware that due to the use of escaped_transform combinator, the parsing result is String instead of &str. I.e., there are extra allocations.

Upvotes: 6

Related Questions