user899055
user899055

Reputation: 301

SQL Server 2008 - multiple import processes simultaneously

I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients.

I have one core table which gets the most hits whenever import processes run. I have 2 options now

  1. To have one core table and do the sequential imports by making queue for the import processes.
  2. To have 300 core table, one for each client, it will allow the users to work on the import processes simultaneously without waiting for one another.

Can anyone suggest which one is better and why?


I am giving my requirements in more detailed this time. Can you all once again have a look at and provide your comments after going through the requirements.

The query is regarding data modeling for core functionality of my application.

I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients. Also, at the same time client's data could be shown to the user and can be modified/inserted too, while the import process for the same or different client is in process.

I have 2 core tables which get the most hits whenever import processes run.

I have 2 options now 1. To have 2 core tables and do the sequential imports by making queue for the import processes.

Table 1

ID ClientID SourceID Count AnotherCol1 AnotherCol2 AnotherCol3

Table 2

ID ClientID OrderID Count AnotherCol4 AnotherCol5 AnotherCol6

  1. To have 1000 core table, 2 for each client (I may have maximum 500 clients), it will allow the users to work on the import processes simultaneously without waiting for one another.

More information about the import process: 1. These table is not going to be used in any Reporting. 2. Each import process will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day. 3. While the import process is going on, data could be retrieved from these tables by some other user and INSERT OR UPDATED too. 4. These are going to be one of the most usable tables in the application. 5. BULK INSERT will be used for insertion. 6. Clustered index is on the Primary Key which is an Identity column. 7. We are considering the table partitioning too.

Can you please suggest which option is better and why?

Also, if you suggest to go with option 2, then would it not be a performance hit to create so many tables in the database? Should we create a separate database for these 1000 tables in this case?


I am giving my requirements in more detailed this time. Can you all once again have a look at and provide your comments after going through the requirements.

The query is regarding data modeling for core functionality of my application.

I have a scenario where multiple users will be doing import processes, but all of them will be working for different clients. Also, at the same time client's data could be shown to the user and can be modified/inserted too, while the import process for the same or different client is in process.

I have 2 core tables which get the most hits whenever import processes run.

I have 2 options now 1. To have 2 core tables and do the sequential imports by making queue for the import processes.

Table 1

ID ClientID SourceID Count AnotherCol1 AnotherCol2 AnotherCol3

Table 2

ID ClientID OrderID Count AnotherCol4 AnotherCol5 AnotherCol6

  1. To have 1000 core table, 2 for each client (I may have maximum 500 clients), it will allow the users to work on the import processes simultaneously without waiting for one another.

More information about the import process: 1. These table is not going to be used in any Reporting. 2. Each import process will insert 20k-30k records (7 columns) in these each table. And there will be around 40-50 such imports in a day. 3. While the import process is going on, data could be retrieved from these tables by some other user and INSERT OR UPDATED too. 4. These are going to be one of the most usable tables in the application. 5. BULK INSERT will be used for insertion. 6. Clustered index is on the Primary Key which is an Identity column. 7. We are considering the table partitioning too.

Can you please suggest which option is better and why?

Also, if you suggest to go with option 2, then would it not be a performance hit to create so many tables in the database? Should we create a separate database for these 1000 tables in this case?

Upvotes: 3

Views: 1235

Answers (2)

Cade Roux
Cade Roux

Reputation: 89671

Another option is a third scenario where you have a single table, but you still can do the imports in parallel by having a batch identifier in the table which stops people stepping on each other.

The main problem with having multiple people in the same table is that you cannot do things like TRUNCATE.

For me the decision would be related to about where the data eventually goes. Is this just a staging table for convenience because there is going to be some SQL for transform or lookup run against it after load? Would it be possible to make such tables in a separate database or schema and with unique names which would make it easy for them to be cleaned up without interfering with or bloating the transaction log in your primary database> Do you need to insert in bulk, then apply indexes and eventually drop the table? Is such a table even necessary if you are using SSIS to load the data, you can often do a lot of work in the pipeline without needing a staging table?

All these things would play into my decision making process on the architecture.

Upvotes: 2

David
David

Reputation: 1611

It's not really a question with a definitive answer as each has it's own benefits and drawbacks.

Scenario 1: Central core table

  • Pros: Central table, easy global modifications
  • Cons: Slower import, more difficult client-level modifications

Scenario 2: 300 core tables

  • Pros: Faster imports, easy client customization
  • Cons: More difficult to deploy changes against all 300 core tables, reporting which needs to touch all tables will be more complicated and probably slower as well

In the end the answer is whatever really works for you

Upvotes: 3

Related Questions