Shmoopy
Shmoopy

Reputation: 5534

Is it possible to parse a text file using Rust's csv crate?

I have a text file with multiple lines. Is it possible to use Rust's csv crate to parse it such that each line is parsed into a different record?

I've tried specifying b'\n' as the field delimiter and left the record terminator as the default. The issue I'm having is that lines can sometimes end with \r\n and sometimes with just \n. This however raises the UnequalLengths error unless the flexible option is specified because apparently new lines take precedence over field delimiters, so the code below:

use csv::{ByteRecord, Reader as CsvReader, ReaderBuilder, Terminator};

fn main() { 
    let data = "foo,foo2\r\nbar,bar2\nbaz\r\n";
    let mut reader = ReaderBuilder::new()
        .delimiter(b'\n')
        .has_headers(false)
        .flexible(true)
        .from_reader(data.as_bytes());
    let mut record = ByteRecord::new();
    loop {
        match reader.read_byte_record(&mut record) {
            Ok(true) => {},
            Ok(false) => { break },
            Err(csv_error) => {
                println!("{}", csv_error);
                break;
            }
        }
        println!("fields: {}", record.len());
        for field in record.iter() {
            println!("{:?}", ::std::str::from_utf8(&field))
        }
    }
}

Will print:

fields: 1
Ok("foo,foo2")
fields: 2
Ok("bar,bar2")
Ok("baz")

I would like the string to be parsed into 3 records with one field each, so the expected output would be:

fields: 1
Ok("foo,foo2")
fields: 1
Ok("bar,bar2")
fields: 1
Ok("baz")

Is it possible to tweak the CSV reader somehow to obtain that behavior?

Conceptually I'd like the field terminator to be None but it seems that the terminator must be a single u8 value

Upvotes: 0

Views: 811

Answers (1)

BurntSushi5
BurntSushi5

Reputation: 15344

I guess I'll re-post my comment as the answer. More succinctly, as the author of the csv crate, I'd say the answer to your question is "no."

Firstly, it's not clear to me why you're trying to use a csv parser for this task at all. As the comments indicate, it's likely that your question is under-specified. Nevertheless, it seems more prudent to just write your own parser.

Secondly, setting both the delimiter and the terminator to the same thing is probably a condition in which the csv reader should panic or return an error. It doesn't really make sense from the perspective of the parser, and its behavior is likely unspecified.

Finally, it seems to me like your desired output indicates that you should just iterate over the lines in your input. It should give you exactly the output you want, as it handles both \n and \r\n.

Upvotes: 2

Related Questions