Reputation: 405
I'm considering converting some excel files I regularly update to a database. The files have a large number of columns. Unfortunately, many of the databases I am looking at, such as Access and PostreSQL, have very low column limits. MySQL's is higher, but I'm worried that as my dataset expands I might break that limit as well.
Basically, I'm wondering what (open source) databases are effective at dealing with this type of problem.
For a description of the data, I have a number of excel files (less than 10) with each containing a particular piece of information on some firms over time. It totals about 100mb in excel files. The firms are in the columns (about 3500 currently), the dates are in the rows (about 270 currently, but switching to a higher frequency for some of the files could easily cause this to balloon).
The most important queries will likely be to get the data for each of the firms on a particular date and put it in a matrix. However, I may also run queries to get all the data for a particular firm for a particular piece of data over every date.
Changing dates to a higher frequency is also the reason that I'm not really interested in transposing the data (the 270 beats Access' limit anyway, but increasing the frequency would far exceed MySQL's column limits). Another alternative might be to change it so that each firm has its own excel file (that way I limit the columns to some amount less than 10), but is quite unwieldy for the purposes of updating the data.
Upvotes: 0
Views: 70
Reputation: 13133
This seems to be begging to be split up!
How using a schema like:
Firms
id
name
Dates
id
date
Data_Points
id
firm_id
date_id
value
This sort of de-composed schema will make reporting quite a bit easier.
For reporting you can easily get a stream of all values with a query like
SELECT firms.name, dates.date, data_points.value from data_points left join firms on firms.id = data_points.firm_id left join dates on dates.id = data_points.date_id
Upvotes: 3