Reputation: 13844
From client machine i have to get 600*10 records for every 30 secs and it should be stored in DB.I have some 600 ids and each ID have 10 parameters so total is 6000.So i am thinking for a table structure which will have minimum performance issue.
1st Idea
Column names will be ids ranging from 1 to 600 and rows will be parameters as follows
1 2 3 4 5 6 7 8 9 9..............598 598 599 600
a b c d e f g h i o..............d d f h
g m m k s l l j j k..............u j j j
.
.
. 10 parameters
.
.
a d g h j k l l l y..............k l l l
a b c d e f g h i o..............d d f h
g m m k s l l j j k..............u j j j
.
.
. 10 parameters
.
.
a d g h j k l l l y..............k l l l
a b c d e f g h i o..............d d f h
g m m k s l l j j k..............u j j j
.
.
. 10 parameters
.
.
a d g h j k l l l y..............k l l l
So for every 30 secs i will insert into DB as new row(corresponding parameters).
2nd idea
I will make 10 different tables for corresponding parameters and one table for ids which is primary key linked with foreign key to the 10 other tables(parameters table)
If you have any other ideas then also most welcome.
I use ms sql server now but DB does not matter(I can use other DB too)
Upvotes: 1
Views: 95
Reputation: 5646
I believe you need staging table concept. What you should be doing is retrieving all the external data sources and loading them exactly from original into staging table(s). After that you load this data to your transactional structure. In this concept, you always differentiate between staging
and transactional
data structures. Btw, this is usually a part of a larger ETL concept (Extract/Transform/Load).
I will describe one 'flavour' of doing staging:
analyze data in its original form
It can be CSV, Excel, delimited or formatted file, another database or whatever. You have to enumerate a finite number of sources and structures to have something to do staging with. It's ok to introduce additional sources later, but for a start concentrate on, let's say, few of them. Write down name for each table/field - make things clear from the very start. Write down/document format of each table/field - that will make it easy for you to write your code, and to look up for data errors later in the production - for example, a textual field with too long content.
create staging table structures
You can do this either in the separate database, separate schema or even same database/schema where your transactional data resides. Database/schema separation is mainly introuduced because of administrative purposes, like limit on the database growth, user permissions to analyze incomming data etc. It is valuable for kind of "separation of concerns" as well. So, take your structures written down in the previous step and create tables out of them. I like to add another identity (sequence) column to each data source to easily identify a loaded row of the original data if I have to manipulate the rows data for some reason.
introduce additional tables/columns
These are computed columns, batch number identifying table, batch number column etc. Every overhead you believe you need to add to your staging tables to make it work for your staging process. An example is when you import data throughout the day multiple times and your process that performs data-loading from staging to transactional data runs once a day. Then before you start each data import, you create the batch number in your Batch
table, and use that number in additional BatchNumber
column that you've added to each of your staging tables. Your data-loading process will then load the data batch by batch.
add constraints
You should be taking this step ONLY if you know what you're doing. One reason to have constraints on staging tables is you don't want to see the inconsistent source data at all. Usually, this is not needed and you want all the external data to be imported for staging.
create data transformation/load procedures
Here I don't refer to SQL Server stored procedures, but logical procedures needed to stage the data. Implementation may vary between raw stored procedures, SSIS packages, external processes, SQLCLR assemblies etc. Bottom line is you have to transform your data from format A in the staging table to format B in transactional table. Usually I write SQL query over a staging table and fill transactional table. You have to cover the case of a failing transformation: what if datetime in original data isn't datetime at all? what if you don't expect empty data, should you replace it with NULL? What about data inconsistency between parent-children, orphan rows for example? Btw, this step is Transform part of the ETL process.
At the end, this data should be inserted to your transactional structures (ETL Load part).
When you implement that, you need a test ride - take some of your data sources and try to run it through the transformation. If at least one of them fail, this is good because you've found something unexpected, and didn't try to stuff the production table with that ;)
I didn't cover creating a schedule for the ETL process, alerting and reporting for data-import failures, data cleansing, performing match/merge etc. Also, I don't have some implementation detals covered here, like should you delete (or better truncate) your imported data from staged table after ETL process or let it stay there to have a kind of import-archive. This can be a rather complex process, and right concepts/tools for the job can make it easier.
HTH
Upvotes: 1
Reputation: 29619
In general, I'd recommend designing the database to model the problem domain as closely as possible, and only worry about performance when you can prove you have a problem.
From your question, I can't get a handle on the underlying data - 10 records, every 30 seconds, with 600 columns seems to be the best description. So, start by creating a table which models that - pretty much "option 1".
Create a test environment, and pump it full with data, so you can recreate production-scale performance issues. 10 records * 120 / hour * 24 hours = 29K/day; push in 6 months worth or so (Google DBMonster for tools which help you create test data; Redgate has a product as well).
Run your inserts, selects, and whatever other queries run against this data, and tune everything. If that doesn't work, buy bigger hardware. Only if that doesn't work should you consider a solution which doesn't model the domain data structure - the long-term cost of maintenance and getting developers up to speed with your exotic solution usually outweighs the cost of hardware by an order of magnitude.
If you reach this point, options to consider:
Upvotes: 1