Post Self
Post Self

Reputation: 1554

What data structure should I use to model a database/table?

Databases usually look like this

┃Name|Age|..┃
┠────┼───┼──┨
┃John│025│..┃
┃Carl│033│..┃
┃....│...│..┃

In this case I mean a table with a fixed column size and a variable size of unsorted rows which can be addressed by an id.

Is there a data structure in C++11 (or earlier) that can represent data like this?

I thought of a couple of ways of cheating such a structure, but none of them is perfect.

1. Separate std::vector

std::vector<std::string> name;
std::vector<unsigned int> age;

// Write
name.push_back("John");
age.push_back(25);

// Read
std::cout << "The first entry is (" << name[0] << " | " << age[0] << ")\n";

Defining a table with many columns takes a lot of markup, though, and writing to it by calling push_back on each std::vector is really tedeous.

2. std::vector of std::tuple

(std::pair would be enough in this case)

std::vector<std::tuple<std::string, unsigned int>> table;

// Write
table.push_back(std::make_tuple("John", 25));

// Read 1
std::string name;
unsigned int age;
std::tie(name, age) = table[0];
std::cout << "The first entry is (" << name << " | " << age << ")\n";

// Read 2
enum
{
    NAME = 0,
    AGE
}
std::cout << "The first entry is (" << std::get<NAME>(table[0]) << " | "
<< std::get<AGE>(table[0]) << ")\n";

(Sorry, if I messed something up here; I've known about the existence of std::tuple since yesterday)

This is fine, but reading from it takes a lot of markup, this time, when you have to define new variables that you want to put the values in. You could just do the std::tie to whatever variable you need the values at, but that becomes unreadable. The second method is almost perfect, but using implicit enums is not something I want to do in C++11.

3. std::vector of std::array

enum
{
    NAME = 0,
    AGE
}

std::vector<std::array<std::string, 2> table;

// Write
table.push_back({"John", "25"});

// Read
std::cout << "The first entry is (" << table[0][NAME] << " | " << table[0][AGE] << ")\n";

This is also pretty good, but it suffers the same problem as 2.2 did. Also this only allows std::string values. In exchange it offers shorter and nicer syntax, though.

Upvotes: 9

Views: 9409

Answers (4)

Post Self
Post Self

Reputation: 1554

Since this question is still seeing activity after 5 years, I will answer it myself with the 5 years of programming experience since then.

There is no point in using std::array or std::tuple to represent a row. A struct is more expressive and has less boilerplate.

The question basically asks the difference between AOS (array of structs), SOA (struct of arrays) and hash maps. The choice between the two depends on what you want to do with the data and how much data there is.

Data is loaded cache lines of 64 bytes with automatic cache preloading when iterating over the data. Since the hash map is non-contiguous and thus cannot take advantage of the caching mechanism, it should only be preferred when individual entries are accessed based on their keys rather than iterating over the entire data set.

The choice between AOS and SOA depends on whether the columns are iterated over individually or together. When iterating separately, by using SOA we can fit more of the same data on the cache line by not looking at the other arrays. For both AOS and SOA, the preferable data structure is std::vector which is a simple contiguous data structure and thus the processor has a lot of information for on-the-fly optimizations.

Upvotes: 1

Amit Vujic
Amit Vujic

Reputation: 1788

#-------------------------------------------
#Try with this piece of code based on <map>
#-------------------------------------------    
#include <iostream>
#include <map>
using namespace std;
//---
class myBook
{
public:
    string ISBN;
    string Title;
    int Pages;

    myBook();
    myBook(const myBook &);
    ~myBook(){};
    myBook &operator=(const myBook &pTr);
    int operator==(const myBook &pTr) const;
    int operator<(const myBook &pTr) const;
};

myBook::myBook()
{
    ISBN = "";
    Title = "";
    Pages = 0;
}


myBook::myBook(const myBook &copyin)
{
    ISBN = copyin.ISBN;
    Title = copyin.Title;
    Pages = copyin.Pages;
}


ostream &operator<<(ostream &output, const myBook &myBook)
{
    output << myBook.ISBN << ':' << myBook.Title << ':' << myBook.Pages << std::endl;
    return(output);
}


myBook& myBook::operator=(const myBook &pTr)
{
    this->ISBN = pTr.ISBN;
    this->Title = pTr.Title;
    this->Pages = pTr.Pages;
    return(*this);
}


int myBook::operator==(const myBook &pTr) const
{
    if( this->ISBN != pTr.ISBN) return(0);
    if( this->Title != pTr.Title) return(0);
    if( this->Pages != pTr.Pages) return(0);
    return(1);
   }

//---------------------------
main()
{
    map<string, myBook> BooksLibrary;
    myBook book1;
    //---
    book1.ISBN="1243954-23";
    book1.Title="Autobiography by Ben Franklin";
    book1.Pages=432;
    BooksLibrary["0001"] = book1;
    //---
    book1.ISBN="555-USA991";
    book1.Title="I Robot by Isac Asimov";
    book1.Pages=323;
    BooksLibrary["0002"] = book1;
    //---
    for( map<string, myBook>::iterator ii=BooksLibrary.begin();     ii!=BooksLibrary.end(); ii++)
    {
        std::cout << (*ii).first << "|" << (*ii).second << endl;
    }
    //---
    return(0);
}

Upvotes: 0

Thomas Matthews
Thomas Matthews

Reputation: 57708

I suggest a std::vector<Record> to hold your records.

Use std::map<key, vector_index> as an index into your records. This will enable you to access records by different search criteria without sorting the vector all the time.

Upvotes: 4

doctorlove
doctorlove

Reputation: 19252

One thing you have't considered is using a std::vector of some kind of associative array, be that a std::map or std::unordered_map.

This would allow you to inspect a given item in the vector (or other sequential container) using the database column names

item["Name"]
item["Age"]

and so on. You would have to use a variant or something like boost::any for the value, obviously.

If you wish to talk to a database in C++, and know at compile time the type the columns have (as you appear to from your suggestions) it might be worth considering ways to generate code with appropriate structures for you directly from the data base scehema. There are a couple of questions around this topic already here and here. What if the db value is null?

Upvotes: 1

Related Questions