Idiomatic way to parse TSV file (ASCII)

Question

I need to parse files containing tab separated numbers, I also know there will always be only two of them. Since my files can be as heavy as a few gigabytes, I wondered if my current parsing method was correct. It looks like I could make the map faster considering I have a fixed size but I couldn't find how.

use std::io::{self, prelude::*, BufReader};

type Record = (u32, u32);

fn read(content: &[u8]) -> io::Result> {
    Ok(BufReader::new(content)
        .lines()
        .map(|line| {
            let nums: Vec = line
                .unwrap()
                .split("	")
                .map(|s| s.parse::().unwrap())
                .collect();
            (nums[0], nums[1])
        })
        .collect::>())
}

fn main() -> io::Result<()> {
    let content = "1	1
\
                    2	2
";
    let records = read(content.as_bytes())?;
    assert_eq!(records.len(), 2);
    assert_eq!(records[0], (1, 1));
    assert_eq!(records[1], (2, 2));
    Ok(())
}

Playground

creativcoder · Accepted Answer

If your entries are only numbers then we can reduce one inner Vec allocation within map like so:


use std::io::{self, prelude::*, BufReader};

type Record = (u32, u32);

fn read(content: &[u8]) -> io::Result> {
    return Ok(BufReader::new(content).lines().map(|line| {
        let line = line.unwrap();
        let mut pair = line.split("	").map(|s|s.parse::().unwrap());
        (pair.next().unwrap(), pair.next().unwrap())
    }).collect::>())
}

fn main() -> io::Result<()> {
    let content = "1	1
\
                    2	2
";
    let records = read(content.as_bytes())?;
    assert_eq!(records.len(), 2);
    assert_eq!(records[0], (1, 1));
    assert_eq!(records[1], (2, 2));
    Ok(())
}

You may want to add better error handling :)

Idiomatic way to parse TSV file (ASCII)

Answers (1)

Related Questions