Reputation: 91
I'm trying to extend bluss's rust-itertools with SQL-like join iterators. I encountered a particular problem with RIGHT OUTER JOIN using a hash join strategy (the strategy itself is actually very simple).
The iterator adaptor struct takes 2 input iterators of which the second (the right) is loaded into the HashMap. The iteration works as follows:
The problem is with the second part where I tried to store the map's Values iterator along with the map to store its iteration state. But as I learned in this answer, it's not possible in rust. Unfortunately I have no idea how it could be done otherwise.
Here is the complete code for the INNER JOIN adaptor, which does the first part:
use std::collections::HashMap;
use std::hash::Hash;
pub struct HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
left: I,
right: HashMap<K, V1>,
}
impl<I, K, V0, V1> HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
/// Create a `HashJoinInner` iterator.
pub fn new<J>(l: I, r: J) -> Self
where J: Iterator<Item=(K, V1)>
{
let mut hm: HashMap<K, V1> = HashMap::new();
for (k, v) in r {
hm.insert(k, v);
}
HashJoinInner {
left: l,
right: hm,
}
}
}
impl<I, K, V0, V1> Iterator for HashJoinInner<I, K, V0, V1> where
I: Iterator<Item=(K, V0)>,
K: Hash + Eq,
V1: Clone,
{
type Item = (V0, V1);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.left.next() {
Some((k0, v0)) => match self.right.get(&k0) {
Some(v1) => return Some((v0, Clone::clone(v1))),
None => continue,
},
None => return None,
}
}
}
}
I'll be grateful for any idea.
Upvotes: 5
Views: 811
Reputation: 91
Thanks to Shepmaster's idea of using std::collections::hash_map::IntoIter
I've managed to solve the problem.
Here is the complete solution for RIGHT OUTER JOIN using the hash join strategy:
use std::collections::hash_map::{HashMap, IntoIter,};
use std::mem;
use std::hash::Hash;
#[must_use = "iterator adaptors are lazy and do nothing unless consumed"]
pub struct HashJoinRightOuter<L, K, RV> {
left: L,
map: HashMap<K, (RV, bool)>,
/// exclusion iterator - yields the unmatched values from the map. It is created once the left
/// iterator is exhausted
excl_iter: Option<IntoIter<K, (RV, bool)>>,
}
impl<L, K, RV> HashJoinRightOuter<L, K, RV>
where K: Hash + Eq,
{
/// Create a `HashJoinRightOuter` iterator.
pub fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
let mut map: HashMap<K, (RV, bool)> = HashMap::new();
for (k, v) in right.into_iter() {
map.insert(k, (v, false));
}
HashJoinRightOuter {
left: left.into_iter(),
map: map,
excl_iter: None,
}
}
/// Moves the map to `self.excl_iter`
///
/// Once the left iterator is exhausted, the info about which keys were matched is complete.
/// To be able to iterate over map's values we need to move it into its `IntoIter`.
fn set_excl_iter(&mut self) {
let map = mem::replace(&mut self.map, HashMap::<K, (RV, bool)>::new());
self.excl_iter = Some(map.into_iter());
}
}
impl<L, K, LV, RV> Iterator for HashJoinRightOuter<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone,
{
type Item = (Option<LV>, RV);
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.excl_iter {
// the left iterator is not yet exhausted
None => match self.left.next() {
Some((lk, lv)) => match self.map.get_mut(&lk) {
Some(rt) => {
rt.1 = true; // flag as matched
return Some((Some(lv), Clone::clone(&rt.0)))
},
None => continue, // not interested in unmatched left value
},
// the left iterator is exhausted so move the map into `self.excl_iter`.
None => self.set_excl_iter(),
},
// iterate over unmatched values
Some(ref mut r) => match r.next() {
Some((_, (rv, matched))) => {
if !matched {
return Some((None, rv));
} else {
continue;
}
},
None => return None,
}
}
}
}
}
fn main() {
let a = (0..).zip("AB".chars());
let b = (1..).zip("XY".chars());
let mut it = HashJoinRightOuter::new(a, b);
assert_eq!(it.next(), Some((Some('B'), 'X')));
assert_eq!(it.next(), Some((None, 'Y')));
assert_eq!(it.next(), None);
}
At the beginning I failed because I tried to store both the data and it's reference in the same struct, which has no meaning anyway. What I really wanted was to store the data first, do some magic with it and once done, move it into another field to work with its transformation.
This can be used to solve other self-referencing struct problems as well.
Upvotes: 1
Reputation: 430554
You cannot store the Values
iterator because it contains references to the HashMap
. These references could become invalid if you move the map. However, you can consume the HashMap
using the into_iter
method. That owns all the values of the HashMap
and can be moved into a new struct.
Here's a tweaking of your code from the earlier question. This isn't yet a left or right join. There's complexity about the switch from being done with one iterator to finishing off the other iterator.
use std::collections::hash_map::{HashMap, IntoIter};
use std::hash::Hash;
struct Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
iter: IntoIter<K, (V, bool)>,
}
impl<K, V> Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
fn new<I>(it: I) -> Self
where I: Iterator<Item=(K, V)>
{
let mut map = HashMap::new();
for (k, v) in it {
map.insert(k, (v, false));
}
Foo { iter: map.into_iter() }
}
}
impl<K, V> Iterator for Foo<K, V>
where K: Hash + Eq,
V: Clone,
{
type Item = V;
fn next(&mut self) -> Option<Self::Item> {
loop {
match self.iter.next() {
Some((_, (v, false))) => return Some(v.clone()),
Some(_) => continue,
None => return None,
}
}
}
}
fn main() {
let it = (0..).zip("AB".chars());
let foo = Foo::new(it);
for v in foo {
println!("{}", v);
}
}
However you don't need to do any of that to get what you want. You can simply create a hashmap and check it as you iterate over the other item. I accidentally created a left outer join, but just flip the arguments to get a right outer join. ^_^
use std::collections::hash_map::HashMap;
use std::hash::Hash;
struct LeftOuterJoin<L, K, RV> {
left: L,
right: HashMap<K, RV>,
}
impl<L, K, RV> LeftOuterJoin<L, K, RV>
where K: Hash + Eq
{
fn new<LI, RI>(left: LI, right: RI) -> Self
where L: Iterator<Item=LI::Item>,
LI: IntoIterator<IntoIter=L>,
RI: IntoIterator<Item=(K, RV)>
{
LeftOuterJoin {
left: left.into_iter(),
right: right.into_iter().collect()
}
}
}
impl<L, K, LV, RV> Iterator for LeftOuterJoin<L, K, RV>
where L: Iterator<Item=(K, LV)>,
K: Hash + Eq,
RV: Clone
{
type Item = (K, LV, Option<RV>);
fn next(&mut self) -> Option<Self::Item> {
match self.left.next() {
Some((k, lv)) => {
let rv = self.right.get(&k);
Some((k, lv, rv.cloned()))
},
None => None,
}
}
}
fn main() {
let mut left = HashMap::new();
left.insert(1, "Alice");
left.insert(2, "Bob");
let mut right = HashMap::new();
right.insert(1, "Programmer");
for (id, name, job) in LeftOuterJoin::new(left.into_iter(), right) {
println!("{} ({}) is a {:?}", name, id, job);
}
}
Upvotes: 1