user754425
user754425

Reputation: 31

How can you simulate a SQL join in C++ using STL and or Boost

How can you simulate a SQL join between two dynamic data sets ( i.e. data is obtained at runtime ) using c++.
Example Table A is a 2D vector of vectors ( any STL or Boost data structure is OK ) of students, their names and course numbers. Table B is a 2D vector of vectors ( any STL or Boost data structure is ok) of course number, description and room numbers

//Table A
// Columns: StudentID   FirstName   LastName    CourseNum
std::vector<std::string> a1 = boost::assign::list_of("3490")( "Saundra")( "Bribiesca")( "F100X");
std::vector<std::string> a2 = boost::assign::list_of("1288")( "Guy")( "Shippy")( "F103X");
std::vector<std::string> a3 = boost::assign::list_of("5383")( "Tia")( "Roache")( "F103X");
std::vector<std::string> a4 = boost::assign::list_of("5746")( "Jamie")( "Grunden")( "F101X");
std::vector<std::string> a5 = boost::assign::list_of("2341")( "Emilia")( "Hankinson")( "F120X");

std::vector<std::vector<std::string > > TableA = boost::assign::list_of(a1)(a2)(a3)(a4)(a5);

//Table B
//Columns: CourseNum    CourseDesc  Room
 std::vector<std::string> b1 = boost::assign::list_of("F100X")("Human Biology")("400B");
 std::vector<std::string> b2 = boost::assign::list_of("F103X")("Biology and Society")("500B");
 std::vector<std::string> b3 = boost::assign::list_of("F101X")("The Dynamic Earth 340A");
 std::vector<std::string> b4 = boost::assign::list_of("F120X")("Glaciers, Earthquakes and Volcanoes")("300C");Earthquakes and Volcanoes");


 std::vector<std::vector<std::string > > TableB = boost::assign::list_of(b1)(b2)(b3)(b4);

  //Table C ( result of joining A and B ) using TableA[3] and TableB[0] as key
//I want to produce a resultset Table C, like this


Table C
StudentID   FirstName   LastName    Room    CourseNum   CourseDesc
3490    Saundra Bribiesca   400B    F100X   Human Biology
1288    Guy Shippy  500B    F103X   Biology and Society
5383    Tia Roache  500B    F103X   Biology and Society
5746    Jamie   Grunden 340A    F101X   The Dynamic Earth
2341    Emilia  Hankinson    300C   F120X   Glaciers, Earthquakes and Volcanoes

Upvotes: 3

Views: 5838

Answers (2)

NoSenseEtAl
NoSenseEtAl

Reputation: 30058

Since only answer doesn't have any code...

Please note that design (vector of strings instead of class makes code unreadable)

int main()
{  
    map<string,std::vector<vector<string > >::const_iterator> mapB;
    for(auto it = TableB.cbegin(); it!=TableB.cend(); ++it)
    { 
        mapB[(*it)[0]]=it;// in first map we put primary key and iterator to tableB where that key is
    }
    assert(mapB.size()== TableB.size());// how unique is primary key?
    for_each(TableA.cbegin(), TableA.cend(),
    [&mapB] (const vector<string>& entryA )
    {
        auto itB= mapB.find(entryA.at(3));
        if (itB!=mapB.end()) // if we can make "JOIN"  we do it
        {
            auto entryB = itB->second;
            cout << entryA.at(0) << " " << entryA.at(1) << "  " << entryA.at(2) << "  " << entryB->at(2) << "  " << entryB->at(0) << "  " << entryB->at(1) << endl;
        }
    });
}

Output:

C:\STL\MinGW>g++ test.cpp &&a.exe

3490 Saundra  Bribiesca  400B  F100X  Human Biology  
1288 Guy  Shippy  500B  F103X  Biology and Society  
5383 Tia  Roache  500B  F103X  Biology and Society  
5746 Jamie  Grunden  340A  F101X  The Dynamic Earth  
2341 Emilia  Hankinson  300C  F120X  Glaciers, Earthquakes and Volcanoes  

Upvotes: 4

Steve Jessop
Steve Jessop

Reputation: 279295

SQL engines use various different techniques to perform joins, depending what indexes are available (or what hashtables it thinks should be created on the fly).

The simplest though is an O(N*M) nested loop over both tables. So to do an inner join you compare every pair of elements one from A and one from B. When you see a match, output a row.

If you need to speed things up, in this case you could create an "index" of table B on its first column, that is a std::multimap with the first column as the key, and a tuple[*] of the rest of the columns as value. Then for each row in A, look up its third column in the index and output one row per match. If the CourseNum is unique in table B, as seems sensible, then you can use a map rather than a multimap.

Either way gets you from O(N*M) to O((N+M)*logM), which is an improvement unless N (the size of table A) is very small. If your college has very many fewer students than courses, something is badly wrong ;-)

[*] where by "tuple" I mean anything that holds all the values - you've been using vectors, and that will do.

Upvotes: 5

Related Questions