milancio
milancio

Reputation: 91

An iterator adaptor implementing an SQL-like RIGHT OUTER JOIN using a HashMap

I'm trying to extend bluss's rust-itertools with SQL-like join iterators. I encountered a particular problem with RIGHT OUTER JOIN using a hash join strategy (the strategy itself is actually very simple).

The iterator adaptor struct takes 2 input iterators of which the second (the right) is loaded into the HashMap. The iteration works as follows:

  1. The item from the left iterator is matched against the map - in case of a match return both items
  2. When the left iterator is exhausted, return the non-matched values from the map

The problem is with the second part where I tried to store the map's Values iterator along with the map to store its iteration state. But as I learned in this answer, it's not possible in rust. Unfortunately I have no idea how it could be done otherwise.

Here is the complete code for the INNER JOIN adaptor, which does the first part:

use std::collections::HashMap;
use std::hash::Hash;

pub struct HashJoinInner<I, K, V0, V1> where
    I: Iterator<Item=(K, V0)>,
    K: Hash + Eq,
    V1: Clone,
{
    left: I,
    right: HashMap<K, V1>,
}

impl<I, K, V0, V1> HashJoinInner<I, K, V0, V1> where
    I: Iterator<Item=(K, V0)>,
    K: Hash + Eq,
    V1: Clone,
{
    /// Create a `HashJoinInner` iterator.
    pub fn new<J>(l: I, r: J) -> Self
        where J: Iterator<Item=(K, V1)>
    {
        let mut hm: HashMap<K, V1> = HashMap::new();
        for (k, v) in r {
            hm.insert(k, v);
        }
        HashJoinInner {
            left: l,
            right: hm,
        }
    }
}

impl<I, K, V0, V1> Iterator for HashJoinInner<I, K, V0, V1> where
    I: Iterator<Item=(K, V0)>,
    K: Hash + Eq,
    V1: Clone,
{
    type Item = (V0, V1);

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match self.left.next() {
                Some((k0, v0)) => match self.right.get(&k0) {
                    Some(v1) => return Some((v0, Clone::clone(v1))),
                    None => continue,
                },
                None => return None,
            }
        }
    }
}

I'll be grateful for any idea.

Upvotes: 5

Views: 811

Answers (2)

milancio
milancio

Reputation: 91

Thanks to Shepmaster's idea of using std::collections::hash_map::IntoIter I've managed to solve the problem.

Here is the complete solution for RIGHT OUTER JOIN using the hash join strategy:

use std::collections::hash_map::{HashMap, IntoIter,};
use std::mem;
use std::hash::Hash;

#[must_use = "iterator adaptors are lazy and do nothing unless consumed"]
pub struct HashJoinRightOuter<L, K, RV> {
    left: L,
    map: HashMap<K, (RV, bool)>,
    /// exclusion iterator - yields the unmatched values from the map. It is created once the left
    /// iterator is exhausted
    excl_iter: Option<IntoIter<K, (RV, bool)>>,
}

impl<L, K, RV> HashJoinRightOuter<L, K, RV> 
    where K: Hash + Eq,
{
    /// Create a `HashJoinRightOuter` iterator.
    pub fn new<LI, RI>(left: LI, right: RI) -> Self
        where L: Iterator<Item=LI::Item>,
              LI: IntoIterator<IntoIter=L>,
              RI: IntoIterator<Item=(K, RV)>
    {
        let mut map: HashMap<K, (RV, bool)> = HashMap::new();
        for (k, v) in right.into_iter() {
            map.insert(k, (v, false));
        }
        HashJoinRightOuter {
            left: left.into_iter(),
            map: map,
            excl_iter: None,
        }
    }

    /// Moves the map to `self.excl_iter`
    ///
    /// Once the left iterator is exhausted, the info about which keys were matched is complete.
    /// To be able to iterate over map's values we need to move it into its `IntoIter`.
    fn set_excl_iter(&mut self) {
        let map = mem::replace(&mut self.map, HashMap::<K, (RV, bool)>::new());
        self.excl_iter = Some(map.into_iter());
    }
}

impl<L, K, LV, RV> Iterator for HashJoinRightOuter<L, K, RV> 
    where L: Iterator<Item=(K, LV)>,
          K: Hash + Eq,
          RV: Clone,
{
    type Item = (Option<LV>, RV);

    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match self.excl_iter {
                // the left iterator is not yet exhausted
                None => match self.left.next() {
                    Some((lk, lv)) => match self.map.get_mut(&lk) {
                        Some(rt) => {
                            rt.1 = true; // flag as matched
                            return Some((Some(lv), Clone::clone(&rt.0)))
                        },
                        None => continue, // not interested in unmatched left value
                    },
                    // the left iterator is exhausted so move the map into `self.excl_iter`.
                    None => self.set_excl_iter(),
                },
                // iterate over unmatched values
                Some(ref mut r) => match r.next() {
                    Some((_, (rv, matched))) => {
                        if !matched {
                            return Some((None, rv));
                        } else {
                            continue;
                        }
                    },
                    None => return None,
                }
            }
        }
    }
}

fn main() {
    let a = (0..).zip("AB".chars());
    let b = (1..).zip("XY".chars());
    let mut it = HashJoinRightOuter::new(a, b);
    assert_eq!(it.next(), Some((Some('B'), 'X')));
    assert_eq!(it.next(), Some((None, 'Y')));
    assert_eq!(it.next(), None);
}

At the beginning I failed because I tried to store both the data and it's reference in the same struct, which has no meaning anyway. What I really wanted was to store the data first, do some magic with it and once done, move it into another field to work with its transformation.

This can be used to solve other self-referencing struct problems as well.

Upvotes: 1

Shepmaster
Shepmaster

Reputation: 430554

You cannot store the Values iterator because it contains references to the HashMap. These references could become invalid if you move the map. However, you can consume the HashMap using the into_iter method. That owns all the values of the HashMap and can be moved into a new struct.

Here's a tweaking of your code from the earlier question. This isn't yet a left or right join. There's complexity about the switch from being done with one iterator to finishing off the other iterator.

use std::collections::hash_map::{HashMap, IntoIter};
use std::hash::Hash;

struct Foo<K, V>
    where K: Hash + Eq,
          V: Clone,
{
    iter: IntoIter<K, (V, bool)>,
}

impl<K, V> Foo<K, V>
    where K: Hash + Eq,
          V: Clone,
{
    fn new<I>(it: I) -> Self
        where I: Iterator<Item=(K, V)>
    {
        let mut map = HashMap::new();
        for (k, v) in it {
            map.insert(k, (v, false));
        }
        Foo { iter: map.into_iter() }
    }
}

impl<K, V> Iterator for Foo<K, V>
    where K: Hash + Eq,
          V: Clone,
{
    type Item = V;
    fn next(&mut self) -> Option<Self::Item> {
        loop {
            match self.iter.next() {
                Some((_, (v, false))) => return Some(v.clone()),
                Some(_) => continue,
                None => return None,
            }
        }
    }
}

fn main() {
    let it = (0..).zip("AB".chars());
    let foo = Foo::new(it);
    for v in foo {
        println!("{}", v);
    }
}

However you don't need to do any of that to get what you want. You can simply create a hashmap and check it as you iterate over the other item. I accidentally created a left outer join, but just flip the arguments to get a right outer join. ^_^

use std::collections::hash_map::HashMap;
use std::hash::Hash;

struct LeftOuterJoin<L, K, RV> {
    left: L,
    right: HashMap<K, RV>,
}

impl<L, K, RV> LeftOuterJoin<L, K, RV> 
    where K: Hash + Eq
{
    fn new<LI, RI>(left: LI, right: RI) -> Self
        where L: Iterator<Item=LI::Item>,
              LI: IntoIterator<IntoIter=L>,
              RI: IntoIterator<Item=(K, RV)>
    {
        LeftOuterJoin {
            left: left.into_iter(),
            right: right.into_iter().collect()
        }
    }
}

impl<L, K, LV, RV> Iterator for LeftOuterJoin<L, K, RV>
    where L: Iterator<Item=(K, LV)>,
          K: Hash + Eq,
          RV: Clone
{
    type Item = (K, LV, Option<RV>);

    fn next(&mut self) -> Option<Self::Item> {
        match self.left.next() {
            Some((k, lv)) => {
                let rv = self.right.get(&k);
                Some((k, lv, rv.cloned()))
            },
            None => None,
        }
    }
}

fn main() {
    let mut left = HashMap::new();
    left.insert(1, "Alice");
    left.insert(2, "Bob");

    let mut right = HashMap::new();
    right.insert(1, "Programmer");

    for (id, name, job) in LeftOuterJoin::new(left.into_iter(), right) {
        println!("{} ({}) is a {:?}", name, id, job);
    }
}

Upvotes: 1

Related Questions