Reputation: 490
I'm having trouble with opening a file. Most examples read files into a String
or read the entire file into a Vec
. What I need is to read a file into chunks of a fixed size and store those chunks into an array (Vec
) of chunks.
For example, I have a file called my_file
of exactly 64 KB size and I want to read it in chunks of 16KB so I would end up with an Vec
of size 4 where each element is another Vec
with size 16Kb (0x4000 bytes).
After reading the docs and checking other Stack Overflow answers, I was able to come with something like this:
let mut file = std::fs::File::open("my_file")?;
// ...calculate num_of_chunks 4 in this case
let list_of_chunks = Vec::new();
for chunk in 0..num_of_chunks {
let mut data: [u8; 0x4000] = [0; 0x4000];
file.read(&mut data[..])?;
list_of_chunks.push(data.to_vec());
}
Although this seems to work fine, it looks a bit convoluted. I read:
Vec
and then move the Vec
into the list_of_chunks
Vec
.I'm not sure if it's idiomatic or even possible, but I'd rather have something like this:
Vec
with num_of_chunk
elements where each element is another Vec
of size 16KB.Vec
No copying and we make sure memory is allocated before reading the file.
Is that approach possible? or is there a better conventional/idiomatic/correct way to do this?
I'm wondering if Vec
is the correct type for solving this. I mean, I won't need the array to grow after reading the file.
Upvotes: 17
Views: 14599
Reputation: 1015
Read::read_to_end
reads efficiently directly into a Vec
. If you want it in chunks, combine it with Read::take
to limit the amount of bytes that read_to_end
will read.
Example:
let mut file = std::fs::File::open("your_file")?;
let mut list_of_chunks = Vec::new();
let chunk_size = 0x4000;
loop {
let mut chunk = Vec::with_capacity(chunk_size);
let n = file.by_ref().take(chunk_size as u64).read_to_end(&mut chunk)?;
if n == 0 { break; }
list_of_chunks.push(chunk);
if n < chunk_size { break; }
}
The last if
is not necessary, but it prevents an extra read
call: If less than the requested amount of bytes was read by read_to_end
, we can expect the next read
to read nothing, since we hit the end of the file.
Upvotes: 29
Reputation: 26697
I think the most idiomatic way would be to use an iterator. The code below (freely inspired by M-ou-se's answer):
use std::io::{self, Read, Seek, SeekFrom};
struct Chunks<R> {
read: R,
size: usize,
hint: (usize, Option<usize>),
}
impl<R> Chunks<R> {
pub fn new(read: R, size: usize) -> Self {
Self {
read,
size,
hint: (0, None),
}
}
pub fn from_seek(mut read: R, size: usize) -> io::Result<Self>
where
R: Seek,
{
let old_pos = read.seek(SeekFrom::Current(0))?;
let len = read.seek(SeekFrom::End(0))?;
let rest = (len - old_pos) as usize; // len is always >= old_pos but they are u64
if rest != 0 {
read.seek(SeekFrom::Start(old_pos))?;
}
let min = rest / size + if rest % size != 0 { 1 } else { 0 };
Ok(Self {
read,
size,
hint: (min, None), // this could be wrong I'm unsure
})
}
// This could be useful if you want to try to recover from an error
pub fn into_inner(self) -> R {
self.read
}
}
impl<R> Iterator for Chunks<R>
where
R: Read,
{
type Item = io::Result<Vec<u8>>;
fn next(&mut self) -> Option<Self::Item> {
let mut chunk = Vec::with_capacity(self.size);
match self
.read
.by_ref()
.take(chunk.capacity() as u64)
.read_to_end(&mut chunk)
{
Ok(n) => {
if n != 0 {
Some(Ok(chunk))
} else {
None
}
}
Err(e) => Some(Err(e)),
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
self.hint
}
}
trait ReadPlus: Read {
fn chunks(self, size: usize) -> Chunks<Self>
where
Self: Sized,
{
Chunks::new(self, size)
}
}
impl<T: ?Sized> ReadPlus for T where T: Read {}
fn main() -> io::Result<()> {
let file = std::fs::File::open("src/main.rs")?;
let iter = Chunks::from_seek(file, 0xFF)?; // replace with anything 0xFF was to test
println!("{:?}", iter.size_hint());
// This iterator could return Err forever be careful collect it into an Result
let chunks = iter.collect::<Result<Vec<_>, _>>()?;
println!("{:?}, {:?}", chunks.len(), chunks.capacity());
Ok(())
}
Upvotes: 6