Reputation: 133
I need to parse files containing tab separated numbers, I also know there will always be only two of them. Since my files can be as heavy as a few gigabytes, I wondered if my current parsing method was correct. It looks like I could make the map faster considering I have a fixed size but I couldn't find how.
use std::io::{self, prelude::*, BufReader};
type Record = (u32, u32);
fn read(content: &[u8]) -> io::Result<Vec<Record>> {
Ok(BufReader::new(content)
.lines()
.map(|line| {
let nums: Vec<u32> = line
.unwrap()
.split("\t")
.map(|s| s.parse::<u32>().unwrap())
.collect();
(nums[0], nums[1])
})
.collect::<Vec<Record>>())
}
fn main() -> io::Result<()> {
let content = "1\t1\n\
2\t2\n";
let records = read(content.as_bytes())?;
assert_eq!(records.len(), 2);
assert_eq!(records[0], (1, 1));
assert_eq!(records[1], (2, 2));
Ok(())
}
Upvotes: 0
Views: 717
Reputation: 449
If your entries are only numbers then we can reduce one inner Vec
allocation within map
like so:
use std::io::{self, prelude::*, BufReader};
type Record = (u32, u32);
fn read(content: &[u8]) -> io::Result<Vec<Record>> {
return Ok(BufReader::new(content).lines().map(|line| {
let line = line.unwrap();
let mut pair = line.split("\t").map(|s|s.parse::<u32>().unwrap());
(pair.next().unwrap(), pair.next().unwrap())
}).collect::<Vec<Record>>())
}
fn main() -> io::Result<()> {
let content = "1\t1\n\
2\t2\n";
let records = read(content.as_bytes())?;
assert_eq!(records.len(), 2);
assert_eq!(records[0], (1, 1));
assert_eq!(records[1], (2, 2));
Ok(())
}
You may want to add better error handling :)
Upvotes: 1