Lawrence Wagerfield
Lawrence Wagerfield

Reputation: 6611

Which DBMS should I use for 1000s of columns and no joins?

We have a table design that consists of 10,000,000 records and 200,000 columns.

The columns are a mixture of:

The queries need to perform and / or operations on 1-100 columns at a time, and should complete in under 0.1 seconds, returning a only projection/subset of each matched row.

Around 10 new columns get added per day.

Around 1,000 new rows get added per day.

There are no joins.

Which DBMS is best suited for this?

Reason behind this approach:
The columns are materialized indexes from user defined queries: that's why new columns get added each day (as more users come up with their own queries). The other option would be to not use materialized views, and have the user's queries perform joins. Problem here is the queries could take any form and in aggregate there would be a large number of very different execution plans across everyones query... since the user defines the query, it's kinda impossible to optimise a traditional SQL database using indexes, normalised tables, etc.

Upvotes: 1

Views: 250

Answers (2)

Branko Dimitrijevic
Branko Dimitrijevic

Reputation: 52137

First, I'd suggest measuring ad-hoc JOINs, and only doing further optimization if you find the performance lacking. I understand it could be difficult to measure every possible query, but you may be able to cover most common/representative cases, and if they perform well-enough just stop there. There is a lot that can be done with good indexing!

Second, and only if the measurements above warrant it, create a new separate materialized view for each ad-hoc query.

  • Some databases will be able to maintain such views automatically for you1, so if the "base" data changes, relevant results will be automatically added or removed from the materialized view (just as they would from the "live" query result).
  • Other databases may allow periodic refresh2.

Be warned though: maintaining materialized views is not free, and having thousands of them (especially if they are constantly kept up-to-date, as opposed to periodically refreshed) will definitely impact the insert/update/delete performance on the base data!


1 E.g. SQL Server indexed views.

2 E.g. Oracle Materialized views, although it looks like 12c can also do something close to SQL Server's immediate refresh.

Upvotes: 1

TheGameiswar
TheGameiswar

Reputation: 28920

Keeping aside ,why you want to go with 1000 of columns,you can look at below databases which support,unlimited columns

enter image description here

References: https://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

Upvotes: 1

Related Questions