Reputation: 1674
Everyday I import 2,000,000 rows from some text files using BULK INSERT
into SQL Server 2008 and then I do some post-processing to update the records.
I have some indexes on the table to execute the post-process as fast as possible and in normal situation, the post-processing script takes about 40 seconds to run.
But sometimes (I don't know when) the post-processing does not work. In the situation I've mentioned, it is not done after an hour! After rebuilding indexes, everything is fine and normal.
What should I do to prevent the problem to be happened?
Right now, I have a nightly job to rebuild all indexes. Why the index fragmentation grows up to 90%?
Update: Here is my table which I import text file into:
CREATE TABLE [dbo].[My_Transactions](
[My_TransactionId] [bigint] NOT NULL,
[FileId] [int] NOT NULL,
[RowNo] [int] NOT NULL,
[TransactionTypeId] [smallint] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionNumber] [dbo].[TransactionNumber] NOT NULL,
[CardNumber] [dbo].[CardNumber] NULL,
[AccountNumber] [dbo].[CardNumber] NULL,
[BankCardTypeId] [smallint] NOT NULL,
[AcqBankId] [smallint] NOT NULL,
[DeviceNumber] [dbo].[DeviceNumber] NOT NULL,
[Amount] [dbo].[Amount] NOT NULL,
[DeviceTypeId] [smallint] NOT NULL,
[TransactionFee] [dbo].[Amount] NOT NULL,
[AcqSwitchId] [tinyint] NOT NULL
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K1_K4_K12_K6_K11_5] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC,
[TransactionTypeId] ASC,
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [_dta_index_Jam_Transactions_8_1290487676__K12_K6_K11_K1_5] ON [dbo].[Jam_Transactions]
(
[Amount] ASC,
[TransactionNumber] ASC,
[DeviceNumber] ASC,
[Jam_TransactionId] ASC
)
INCLUDE ( [TransactionDate]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_Jam_Transactions] ON [dbo].[Jam_Transactions]
(
[Jam_TransactionId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
Upvotes: 3
Views: 4934
Reputation: 9
I would try taking the index offline before a mass row insertion and the bring it back online after the mass row insertion. Much, much more faster as compared to re-indexing, or performing a drop and create index..... difference is that the index is there data is being stored but the index is currently not being used, "Offline" until it is brought back "Online". I have a 1.5 million row insertion process and had a problem with one of my non clustered indexes fragmenting which was causing poor performance. Went form 99% fragmentation to .14% using the MSSQL Offline Online Index option....
Code sample:
ALTER INDEX idx_a ON dbo.tbl_A
REBUILD WITH (ONLINE = OFF);
Toggle between OFF and ON and you are good to go....
Upvotes: 0
Reputation: 8693
Have you tried just refreshing the statistics after such a large insert:
UPDATE STATISTICS my_table
My experience with large bulk inserts is that the statistics get all mangled up and need a refreshed afterwards, it's also much faster than running a REINDEX or index REORDER.
Another option is to look into padding the index, you likely have no padding fill factor on your indexes meaning that if your index is:
A, B, D, E, F
and you insert a value with a CardNumber
of C, then your index will look like:
A, B, D, E, F, C
and hence be ~20% fragmented, if you instead specify a fill factor for your index of say 15% we would see it look like roughly:
A, B, D, _, E, F
(Note the internal the empty space is put roughly in the middle point of the fillfactor % not at the end)
So that when you insert the C value it is closer to being correct, but it actually sees that the D is just swapped with the C and usually moves the D at that point.
Beyond that, are you sure that the fragmentation is actually the problem, as part of reindexing the table is read and loaded entirely into memory (provided it fits) and thus any query you run on it will be very fast.
Upvotes: 2
Reputation: 29194
Does your main table keep growing by 2 million rows per day or is there a lot of deleting taking place as well? Could you bulk insert into a temporary import table and do your processing prior to inserting into the main table? You can always use hints to force your queries to use certain indices:
SELECT *
FROM your_table_name WITH (INDEX(your_index_name))
WHERE your_column_name = 5
Upvotes: 0
Reputation: 280252
Instead of including this table in the nightly job, why don't you make index maintenance (on this table specifically) part of the nightly import job, between BULK INSERT
and whatever 'post-processing' is?
We don't have enough information to know why the index fragmentation grows that quickly. Which index(es)? How many indexes are there? What is the order of the data in the file?
You may also consider using the ORDER
option in the BULK INSERT
statement to change the way the data is inserted. It may make the load take longer but it should reduce the need to reorganize. Again depending on the order of the source data and the index(es) that become fragmented.
Finally, what is the impact of rebuilding/not rebuilding or reorganizing/not reorganizing the indexes? Have you tried both? Perhaps it makes the post-processing run quicker if you rebuild, but perhaps only a defragment is necessary. And while it may make the post-processing quicker, what about the queries that are run against the table later in the day? Have you done any metrics against those to see if they speed up or slow down depending on what you do at night?
Upvotes: 1