Reputation: 131
Currently I have a program that is reading some emails from disk, and parsing some included text (that is csv-like, although happens to be fixed-width fields and '|' separated.
The emails are not particularly huge, so I fs::read_to_string them into a string (in a loop), and for each one use .split("\n") to iterate over lines, then run a constructor on each line to create a struct for each valid csv-like line.
So like
let mut hostiter = text.split("\n")
.filter_map(|x| HostInfo::from_str(x));
Where HostInfo has owned values, copying from the &str references.
This all works fine as is, but now I want to be able to handle emails that quote the records I'm looking for (i.e. lines that start with "> > "). That's easy enough:
let quotes = &['>', ' '];
let mut hostiter = text.split("\n")
.map(|x| x.trim_start_matches(quotes))
.filter_map(|x| HostInfo::from_str(x));
I also need to cope with rfc3676/format=flowed emails? This means that, when forwarded/replied to, email clients split the lines so that each record I'm looking for is split over 2 or more lines. Continuation lines are delineated with " \r\n", i.e. it has a space before the cr/newline. Non-continuation lines have the "\r\n" after a non-space character. (Currently my code skips these partial records.) I need an iterator that iterates over complete lines. I'm thinking of two ways of doing this:
I realize I can't have an iterator that returns complete records as a single &str (unless I do 1.), since the records are split in the original string. However I can refactor my constructor to take a vector of &str instead of a single &str.
Upvotes: 0
Views: 40
Reputation: 131
In the end I used coalesce to group the lines. Since the items I'm iterating over are &str which can't be joined without allocation I decided to store the output as Vec<&str>. Since coalesce wants the same types as input and output (why?), I needed to convert the &str to single item vectors before using it. The resulting code was:
let mut hostiter = text.split("\r\n")
.map(|x| vec![x.trim_start_matches(quotes)])
.coalesce(|mut x, mut y| match o.flowed && x[x.len()-1].ends_with(' ') {
true => { x.append(&mut y); Ok( x )},
false => Err( (x,y) ),
})
.filter_map(|x| HostInfo::from_vec_str(x);
(o.flowed is a flag indicating whether we picked up a Content type: with format=flowed in the headers of the email.)
I had to convert my HostInfo::from_str function to HostInfo::from_vec_str to take a Vec<&str> instead of a &str. Since my from_str function splits the &str on spaces anyway, it was easy enough to use flat_map to split each &str in the Vec and output words...
Not sure if coalesce is the best way to do this. I was looking for an iterator adaptor that would take a closure that takes a collection and an item, and returns a bool; I.e. does this item belong with the other items in this collection? The iterator adaptor output would iterate over collections of items.
Upvotes: 0