Reputation: 564
So, I'm working on porting a string tokenizer that I wrote in Python over to Rust, and I've run into an issue I can't seem to get past with lifetimes and structs.
So, the process is basically:
Vec<String>
of tokensCounter
and Unicase
to get counts of individual instances of tokens from each vec
struct Corpus<'a> {
words: Counter<UniCase<&'a String>>,
parts: Vec<CorpusPart<'a>>
}
pub struct CorpusPart<'a> {
percent_of_total: f32,
word_count: usize,
words: Counter<UniCase<&'a String>>
}
fn process_file(entry: &DirEntry) -> CorpusPart {
let mut contents = read_to_string(entry.path())
.expect("Could not load contents.");
let tokens = tokenize(&mut contents);
let counted_words = collect(&tokens);
CorpusPart {
percent_of_total: 0.0,
word_count: tokens.len(),
words: counted_words
}
}
pub fn tokenize(normalized: &mut String) -> Vec<String> {
// snip ...
}
pub fn collect(results: &Vec<String>) -> Counter<UniCase<&'_ String>> {
results.iter()
.map(|w| UniCase::new(w))
.collect::<Counter<_>>()
}
However, when I try to return CorpusPart
it complains that it is trying to reference a local variable tokens
. How can/should I deal with this? I tried adding lifetime annotations, but couldn't figure it out...
Essentially, I no longer need the Vec<String>
, but I do need some of the String
s that were in it for the counter.
Any help is appreciated, thank you!
Upvotes: 2
Views: 146
Reputation: 4123
The issue here is that you are throwing away Vec<String>
, but still referencing the elements inside it. If you no longer need Vec<String>
, but still require some of the contents inside, you have to transfer the ownership to something else.
I assume you want Corpus
and CorpusPart
to both point to the same Strings, so you are not duplicating Strings needlessly. If that is the case, either Corpus
or CorpusPart
must own the String, so that the one that don't own the String references the Strings owned by the other. (Sounds more complicated that it actually is)
I will assume CorpusPart
owns the String, and Corpus
just points to those strings
use std::fs::DirEntry;
use std::fs::read_to_string;
pub struct UniCase<a> {
test: a
}
impl<a> UniCase<a> {
fn new(item: a) -> UniCase<a> {
UniCase {
test: item
}
}
}
type Counter<a> = Vec<a>;
struct Corpus<'a> {
words: Counter<UniCase<&'a String>>, // Will reference the strings in CorpusPart (I assume you implemented this elsewhere)
parts: Vec<CorpusPart>
}
pub struct CorpusPart {
percent_of_total: f32,
word_count: usize,
words: Counter<UniCase<String>> // Has ownership of the strings
}
fn process_file(entry: &DirEntry) -> CorpusPart {
let mut contents = read_to_string(entry.path())
.expect("Could not load contents.");
let tokens = tokenize(&mut contents);
let length = tokens.len(); // Cache the length, as tokens will no longer be valid once passed to collect
let counted_words = collect(tokens);
CorpusPart {
percent_of_total: 0.0,
word_count: length,
words: counted_words
}
}
pub fn tokenize(normalized: &mut String) -> Vec<String> {
Vec::new()
}
pub fn collect(results: Vec<String>) -> Counter<UniCase<String>> {
results.into_iter() // Use into_iter() to consume the Vec that is passed in, and take ownership of the internal items
.map(|w| UniCase::new(w))
.collect::<Counter<_>>()
}
I aliased Counter<a>
to Vec<a>
, as I don't know what Counter you are using.
Upvotes: 3