Reputation: 83
Consider, for the sake of simplicity, that I want to implement an indexable Vector v with n consecutive elements 0,1,...,n-1, i.e. v[i] = i. This vector is supposed to be filled on demand, that is, if v[i] is used and currently the vector contains n < i+1 elements, the values n+1,n+2,...,i are first pushed onto v, and then the reference to v[i] is returned.
Code below works fine.
struct LazyVector {
data: Vec<usize>
}
impl LazyVector {
fn new() -> LazyVector {
LazyVector{
data: vec![]
}
}
fn get(&mut self, i:usize) -> &usize {
for x in self.data.len()..=i {
self.data.push(i);
}
&self.data[i]
}
}
pub fn main() {
let mut v = LazyVector::new();
println!("v[5]={}",v.get(5)); // prints v[5]=5
}
However, the code above is just a mock-up of the actual structure I'm trying to implement. In addition to that, (1) I'd like to be able to use the index operator and, (2) although the vector may actually be modified when accessing a position, I'd like that to be transparent to the user, that is, I'd like to be able to index any position even if I had an immutable reference to v. Immutable references are preferred to prevent other unwanted modifications.
Requirement (1) could be achieved by implementing the Index trait, like so
impl std::ops::Index<usize> for LazyVector {
type Output = usize;
fn index(&self, i: usize) -> &Self::Output {
self.get(i)
}
}
However, this does not compile since we need a mutable reference in order to be able to call LazyVector::get. Because of requirement (2) we do not want to make this reference mutable, and even if we did, we couldn't do that since it would violate the interface of the Index trait. I figured that this would make the case for the interior mutability pattern through the RefCell smart pointer (as in Chapter 15 of The Rust Book). So I came up with something like
struct LazyVector {
data: std::cell::RefCell<Vec<usize>>
}
impl LazyVector {
fn new() -> LazyVector {
LazyVector{
data: std::cell::RefCell::new(vec![])
}
}
fn get(&self, i:usize) -> &usize {
let mut mutref = self.data.borrow_mut();
for x in mutref.len()..=i {
mutref.push(x)
}
&self.data.borrow()[i] // error: cannot return value referencing a temporary value
}
}
However this doesn't work because it tries to return a value referencing the Ref struct returned by borrow() that goes out of scope at the end of LazyVector::get. Finally, to circumvent that, I did something like
struct LazyVector {
data: std::cell::RefCell<Vec<usize>>
}
impl LazyVector {
fn new() -> LazyVector {
LazyVector{
data: std::cell::RefCell::new(vec![])
}
}
fn get(&self, i:usize) -> &usize {
let mut mutref = self.data.borrow_mut();
for x in mutref.len()..=i {
mutref.push(x)
}
unsafe { // Argh!
let ptr = self.data.as_ptr();
&std::ops::Deref::deref(&*ptr)[i]
}
}
}
impl std::ops::Index<usize> for LazyVector {
type Output = usize;
fn index(&self, i: usize) -> &Self::Output {
self.get(i)
}
}
pub fn main() {
let v = LazyVector::new(); // Unmutable!
println!("v[5]={}",v.get(5)); // prints v[5]=5
}
Now it works as required but, as a newbie, I am not so sure about the unsafe block! I think I am effectively wrapping it with a safe interface, but I'm not sure. So my question is whether that is OK or if there is a better, totally safe way to achieve that.
Thanks for any help.
Upvotes: 1
Views: 379
Reputation: 8190
EDIT Since you provided more info on your goal (lazy access to chunks of a huge file that lies on disk), I update my answer.
You can use (as you tried) cells. I quote the doc:
Since cell types enable mutation where it would otherwise be disallowed though, there are occasions when interior mutability might be appropriate, or even must be used, e.g. [...] Implementation details of logically-immutable methods. [...]
Here's a piece of code that does the job (note that's very close to what you wrote):
use std::cell::RefCell;
use std::ops::Index;
// This is your file
const DATA: &str = "Rust. A language empowering everyone to build reliable and efficient software.";
#[derive(Debug)]
struct LazyVector<'a, 'b> {
ref_v: RefCell<&'a mut Vec<&'b str>>
}
impl<'a, 'b> LazyVector<'a, 'b> {
fn new(v: &'a mut Vec<&'b str>) -> LazyVector<'a, 'b> {
LazyVector {
ref_v: RefCell::new(v)
}
}
/// get or load a chunk of two letters
fn get_or_load(&self, i: usize) -> &'b str {
let mut v = self.ref_v.borrow_mut();
for k in v.len()..=i {
v.push(&DATA[k * 2..k * 2 + 2]);
}
v[i]
}
}
impl<'a, 'b> Index<usize> for LazyVector<'a, 'b> {
type Output = str;
fn index(&self, i: usize) -> &Self::Output {
self.get_or_load(i)
}
}
pub fn main() {
let mut v = vec![];
let lv = LazyVector::new(&mut v);
println!("v[5]={}", &lv[5]); // v[5]=ng
println!("{:?}", lv); // LazyVector { ref_v: RefCell { value: ["Ru", "st", ". ", "A ", "la", "ng"] } }
println!("v[10]={}", &lv[10]); // v[10]=ow
println!("{:?}", lv); // LazyVector { ref_v: RefCell { value: ["Ru", "st", ". ", "A ", "la", "ng", "ua", "ge", " e", "mp", "ow"] } }
}
The main difference with your try is that the underlying Vec
is an external mutable vector, and that LazyVector
gets only a (mutable) ref on this vector. A RwLock should be the way to handle concurrent access.
However, I wouldn't recommend that solution:
First, your underlying Vec
will rapidly grow and become as huge as the file on disk. Hence, you'll need a map instead of a vector and to keep the number of chunks in that map under a given boundary. If you ask for a chunk that is not in memory, you'll have to choose a chunk to remove. That's simply Paging and the OS is generally better at this game than you (see page replacement algorithm). As I wrote in a comment, memory mapped files (and maybe shared memory in case of "heavy" processes) would be more efficient: the OS handles the lazy loading of the file and the share of the read only data. R. Sedgewick remark in Algorithms in C, first edition, chapter 13, section "An Easier Way", explains why sorting a huge file (bigger than memory) may be easier than one thought:
In a good virtual-memory system, the programmer can address a very large amount of data, leaving to the system the responsibility of making sure that the adressed data is transferred from external to internal storage when needed.
Second, see my previous answer below.
PREVIOUS ANSWER
I coded this kind of vector once... in Java. The use case was to represent a very sparse grid (many of the rows where only a few cells wide, but the grid was supposed to have a width of 1024). To avoid to have to manually add cells when needed, I created a "list" that was doing roughly what you try to achieve (but there was only one default value).
At first, I made my list implement the List
interface, but I quickly realized that I had to make a lot of useless (and slow) code not to break the Liskov substitution principle. Worse, the behavior of some methods was misleading regarding to the usual lists (ArrayList
, LinkedList
, ...).
It seems you are in the same situation: you would like your LazyVector
to look like a usual Vec
, and that's why you want to implement Index
and maybe IndexMut
traits. But you are looking for workarounds to achieve this (e.g. unsafe
code to match the traits methods signatures).
My advice is: do not try to make LazyVector
look like a usual vector, but make it clear that the LazyVector
is not a usual vector. This is the Principle of least astonishment. E.g. replace get
(expected to only read the data by the user in good faith) by get_or_extend
that makes clear that either you get something, either you create it.
If you add a get_or_extend_mut
function, you have something that is not very attractive but efficient and predictable:
impl LazyVector {
fn new() -> LazyVector { ... }
fn get_or_extend(&mut self, i: usize) -> &usize { ... }
fn get_or_extend_mut(&mut self, i: usize) -> &mut usize { ... }
}
Upvotes: 0