AlexKing
AlexKing

Reputation: 131

Rust reconstitute format=flowed emails, or an iterator that combines some elements of the previous iterator

Currently I have a program that is reading some emails from disk, and parsing some included text (that is csv-like, although happens to be fixed-width fields and '|' separated.

The emails are not particularly huge, so I fs::read_to_string them into a string (in a loop), and for each one use .split("\n") to iterate over lines, then run a constructor on each line to create a struct for each valid csv-like line.

So like

let mut hostiter = text.split("\n")
    .filter_map(|x| HostInfo::from_str(x));

Where HostInfo has owned values, copying from the &str references.

This all works fine as is, but now I want to be able to handle emails that quote the records I'm looking for (i.e. lines that start with "> > "). That's easy enough:

    let quotes = &['>', ' '];
    let mut hostiter = text.split("\n")
        .map(|x| x.trim_start_matches(quotes))
        .filter_map(|x| HostInfo::from_str(x));

I also need to cope with rfc3676/format=flowed emails? This means that, when forwarded/replied to, email clients split the lines so that each record I'm looking for is split over 2 or more lines. Continuation lines are delineated with " \r\n", i.e. it has a space before the cr/newline. Non-continuation lines have the "\r\n" after a non-space character. (Currently my code skips these partial records.) I need an iterator that iterates over complete lines. I'm thinking of two ways of doing this:

  1. The easiest may be to split the string (on '\n'), trim the starts of any quoting, then collect the string into a new string with '\n' separating to remove the quotes. Then a second pass to replace all " \r\n" with ' ' again producing a new string. Now I have a string that can be split on '\n' and has complete records.
  2. Else is there an iterator adapter I can use that will combine elements if they are continuation lines? e.g. can I use group_by to group lines with their continuation lines?

I realize I can't have an iterator that returns complete records as a single &str (unless I do 1.), since the records are split in the original string. However I can refactor my constructor to take a vector of &str instead of a single &str.

Upvotes: 0

Views: 40

Answers (1)

AlexKing
AlexKing

Reputation: 131

In the end I used coalesce to group the lines. Since the items I'm iterating over are &str which can't be joined without allocation I decided to store the output as Vec<&str>. Since coalesce wants the same types as input and output (why?), I needed to convert the &str to single item vectors before using it. The resulting code was:

let mut hostiter = text.split("\r\n")
    .map(|x| vec![x.trim_start_matches(quotes)])
    .coalesce(|mut x, mut y| match o.flowed && x[x.len()-1].ends_with(' ') {
        true => { x.append(&mut y); Ok( x )},
        false => Err( (x,y) ),
    })
    .filter_map(|x| HostInfo::from_vec_str(x);

(o.flowed is a flag indicating whether we picked up a Content type: with format=flowed in the headers of the email.)

I had to convert my HostInfo::from_str function to HostInfo::from_vec_str to take a Vec<&str> instead of a &str. Since my from_str function splits the &str on spaces anyway, it was easy enough to use flat_map to split each &str in the Vec and output words...

Not sure if coalesce is the best way to do this. I was looking for an iterator adaptor that would take a closure that takes a collection and an item, and returns a bool; I.e. does this item belong with the other items in this collection? The iterator adaptor output would iterate over collections of items.

Upvotes: 0

Related Questions