Reputation: 31
How can you simulate a SQL join between two dynamic data sets ( i.e. data is obtained at runtime ) using c++.
Example Table A is a 2D vector of vectors ( any STL or Boost data structure is OK ) of students, their names and course numbers. Table B is a 2D vector of vectors ( any STL or Boost data structure is ok) of course number, description and room numbers
//Table A
// Columns: StudentID FirstName LastName CourseNum
std::vector<std::string> a1 = boost::assign::list_of("3490")( "Saundra")( "Bribiesca")( "F100X");
std::vector<std::string> a2 = boost::assign::list_of("1288")( "Guy")( "Shippy")( "F103X");
std::vector<std::string> a3 = boost::assign::list_of("5383")( "Tia")( "Roache")( "F103X");
std::vector<std::string> a4 = boost::assign::list_of("5746")( "Jamie")( "Grunden")( "F101X");
std::vector<std::string> a5 = boost::assign::list_of("2341")( "Emilia")( "Hankinson")( "F120X");
std::vector<std::vector<std::string > > TableA = boost::assign::list_of(a1)(a2)(a3)(a4)(a5);
//Table B
//Columns: CourseNum CourseDesc Room
std::vector<std::string> b1 = boost::assign::list_of("F100X")("Human Biology")("400B");
std::vector<std::string> b2 = boost::assign::list_of("F103X")("Biology and Society")("500B");
std::vector<std::string> b3 = boost::assign::list_of("F101X")("The Dynamic Earth 340A");
std::vector<std::string> b4 = boost::assign::list_of("F120X")("Glaciers, Earthquakes and Volcanoes")("300C");Earthquakes and Volcanoes");
std::vector<std::vector<std::string > > TableB = boost::assign::list_of(b1)(b2)(b3)(b4);
//Table C ( result of joining A and B ) using TableA[3] and TableB[0] as key
//I want to produce a resultset Table C, like this
Table C StudentID FirstName LastName Room CourseNum CourseDesc 3490 Saundra Bribiesca 400B F100X Human Biology 1288 Guy Shippy 500B F103X Biology and Society 5383 Tia Roache 500B F103X Biology and Society 5746 Jamie Grunden 340A F101X The Dynamic Earth 2341 Emilia Hankinson 300C F120X Glaciers, Earthquakes and Volcanoes
Upvotes: 3
Views: 5838
Reputation: 30058
Since only answer doesn't have any code...
Please note that design (vector of strings instead of class makes code unreadable)
int main()
{
map<string,std::vector<vector<string > >::const_iterator> mapB;
for(auto it = TableB.cbegin(); it!=TableB.cend(); ++it)
{
mapB[(*it)[0]]=it;// in first map we put primary key and iterator to tableB where that key is
}
assert(mapB.size()== TableB.size());// how unique is primary key?
for_each(TableA.cbegin(), TableA.cend(),
[&mapB] (const vector<string>& entryA )
{
auto itB= mapB.find(entryA.at(3));
if (itB!=mapB.end()) // if we can make "JOIN" we do it
{
auto entryB = itB->second;
cout << entryA.at(0) << " " << entryA.at(1) << " " << entryA.at(2) << " " << entryB->at(2) << " " << entryB->at(0) << " " << entryB->at(1) << endl;
}
});
}
Output:
C:\STL\MinGW>g++ test.cpp &&a.exe
3490 Saundra Bribiesca 400B F100X Human Biology
1288 Guy Shippy 500B F103X Biology and Society
5383 Tia Roache 500B F103X Biology and Society
5746 Jamie Grunden 340A F101X The Dynamic Earth
2341 Emilia Hankinson 300C F120X Glaciers, Earthquakes and Volcanoes
Upvotes: 4
Reputation: 279295
SQL engines use various different techniques to perform joins, depending what indexes are available (or what hashtables it thinks should be created on the fly).
The simplest though is an O(N*M) nested loop over both tables. So to do an inner join you compare every pair of elements one from A and one from B. When you see a match, output a row.
If you need to speed things up, in this case you could create an "index" of table B on its first column, that is a std::multimap
with the first column as the key, and a tuple[*] of the rest of the columns as value. Then for each row in A, look up its third column in the index and output one row per match. If the CourseNum is unique in table B, as seems sensible, then you can use a map
rather than a multimap
.
Either way gets you from O(N*M)
to O((N+M)*logM)
, which is an improvement unless N (the size of table A) is very small. If your college has very many fewer students than courses, something is badly wrong ;-)
[*] where by "tuple" I mean anything that holds all the values - you've been using vectors, and that will do.
Upvotes: 5