Reputation: 58
I want to read one line from stdin, then split it by whitespaces and process these parts.
A simple read_line would work since it returns a owned String:
fn read_line() -> String {
let mut str: String = String::new();
stdin().read_line(&mut str).unwrap();
return str;
}
But I couldn't make it when I want to consume the String and return a owned Split whose lifetime goes out of the function that creates it.
To be clear, I would like to get the split result solely (any type like), while the memory allocated for characters is still kept as long as the split lives.
I've tried:
use std::{io::stdin, str::SplitWhitespace};
fn main() {
let split = read_line_and_split();
}
// doesn't compile: missing lifetime specifier
fn read_line_and_split() -> SplitWhitespace {
let mut str = String::new();
stdin().read_line(&mut str);
str.split_whitespace()
}
Upvotes: 2
Views: 154
Reputation: 154886
Creating an owned version of Split
is trickier than it seems. For example, let's say you try the obvious:
// doesn't compile
fn owned_split(s: String) -> impl Iterator<Item = String> {
s.split_whitespace().map(|sub| sub.to_string())
}
That doesn't compile because the value behind impl Iterator
is really SplitWhitespace<'a>
(returned by str::split_whitespace()
) which holds a reference to the string. This implementation of owned_split()
effectively attempts to return a reference to the local variable s
, which doesn't compile (nor should it). Ideally we'd instead return a struct that contains both the original string and the SplitWhitespace<'a>
iterator that points into it. But that doesn't work because self-referential structs aren't yet supported by the borrow checker. It could be made to work in "safe" code using one of the self-referencing crates, but let's explore other options first.
As noted in the comments, the simplest and most obvious option is to just collect into a Vec<String>
and be done with it:
let split: Vec<String> = s.split_whitespace().map(|sub| sub.to_owned()).collect();
But if your string is really long or you're curious about alternatives as a learning exercise, read on.
An alternative is to reinvoke str::split_whitespace()
each time the next split is requested, and return the first item from the remainder of the string. This requires some resourcefulness to figure out where to continue looking for whitespace, for example by looking at the address of the returned substring:
fn owned_split(s: String) -> impl Iterator<Item = String> {
let mut pos = 0;
std::iter::from_fn(move || {
let sub = s[pos..].split_whitespace().next()?;
// Next search position is at the end of `sub`, but we need it as
// index into `s`. Since `sub` is a slice of `s`, we calculate
// where in `s` its end lies by subtracting the address of end of
// `sub` from the address of start of `s`.
pos = sub.as_bytes().as_ptr_range().end as usize - s.as_ptr() as usize;
Some(sub.to_owned())
})
}
The pointer subtraction looks scary but is still 100% safe, because we're not dereferencing data behind the pointer, just inquiring about its address in order to determine the index.
Finally, here is a version that uses self_cell to create a self-referencing struct that contains both the owned string and the splitting iterator that borrows it:
use std::str::SplitWhitespace;
fn owned_split(s: String) -> impl Iterator<Item = String> {
self_cell::self_cell! {
struct OwnedSplit {
owner: String,
#[not_covariant]
dependent: SplitWhitespace,
}
}
impl Iterator for OwnedSplit {
type Item = String;
fn next(&mut self) -> Option<String> {
self.with_dependent_mut(|_, split| split.next().map(|s| s.to_owned()))
}
}
OwnedSplit::new(s, |s| s.split_whitespace())
}
This version leaves it to self_cell
to generate code that uses unsafe
to create the self-referential type in a way that plays well with the borrow checker. Barring a bug in self_cell
, such usage of unsafe should either be sound or fail to compile. Version using ouroboros is very similar and provides the same guarantees, but I'd recommend self_cell
because it generates much less code and doesn't use proc macros, resulting in faster compilation.
Upvotes: 7