Samd
Samd

Reputation: 11

Aggregation over a large data: C++, boost, or RDBMS?

I want to read a file with 600M records in C++, and perform aggregation based on given criteria for the fields (e.g. empl.loacation='FL' and empl.dept=3).

Is using C++ a viable option? I can go the database route, but was wondering, given that my requirement is readonly - aggregation, can I just use C++?

I saw the boost multi index library. Is it more appropriate for this kind of operation than plain C++ or a DB?

Upvotes: 1

Views: 396

Answers (1)

To determine whether Boost.MultiIndex fits your bill, ask yourself the following two questions:

  1. Is your target computer capable enough to hold the structure in memory? You'll roughly need N*(I*3*p + sizeof(element)), where N is the number of elements (600M in your case,), I the number of indices and p the size of a pointer (4 bytes in a 32bit architecture, 8 in a 64bit environment.)
  2. Is the number of query patterns relatively low and fixed? As for what a query pattern is, you mention the example "location=A and dept=B". How many of these you'll have? (Modulo some optimizations, you'll need an index per query pattern.)

If the answer to both questions is yes, Boost.MultiIndex can help you.

Upvotes: 4

Related Questions