Reputation: 353
I have a existing table (VendorMaster) with 18 columns and around 2,00,000 rows in it. Now i have built new table (NewVendorMaster) with different logic to get same data under same schema.
Schema of VendorMaster and NewVendorMaster V1 V2 V3 V4 V5 V6 V7 V8..... V16 data1 data2 data3 data4 data5 dataN type3 type16 data3 data4 type3 type17 data14 data18 type20 data4 type17 data14 type45 type20
Now i want to compare those two tables. As there are 18 columns and 2,00,000 records should i go for column wise comparison or concatenate all data into one column and then compare for faster result?
Join on these two tables seems to take hours to compare.
and also would creating index on both the tables would help for faster comparison?
Upvotes: 0
Views: 1682
Reputation: 6002
I would advice against concatenating the fields before comparing but rather suggest to create a checksum value and work from there. I've done something similar and approached it like this: first I added an extra field to the table that holds the BINARY_CHECKSUM() of the key fields (you can use a calculated field for this!) and put an index on just that crc-field but INCLUDE() all the actual key fields. The generated values are not going to be unique, but it will be close enough to work. Mind that it might take a while for the index to be created, depending on the size it could take up quite a bit of space too.
When JOINing both tables join on the crc + on all the key fields (**). SQL will be quite good at matching just the integer to find the right row(s) and then comparing the other fields can be done from the included columns.
(**: don't rely on just the crc-value, BINARY_CHECKSUM() is fast and easy to use but will probably have collisions!)
Upvotes: 2
Reputation: 116
This is not a complete answer, but it may help save some time
I was doing a very similar thing - normalising a 2.5 million row one dimensional database into half a dozen tables.
What I ended up doing was using 'C' to write short programs to loåd data in and step through it and then insert new records in the new database tables because the clever SQL took all night to not actually generated a result.
Using C I could write debug output to see where things were happening, and better still, when things went wrong I could see what was gong wrong, and best of all, when the job was half done, it was half done - aborting the program halfway through at least left some good new data in the new table.
So my main point is this: dont try to be clever and construct that magic sql query that does all the work. Break it into steps and program it. It runs massively faster.
And you know far better how the data is arranged than SQL ever will.
(Adding indexes is always good on tables that dont change much and really speeds up searches. You can always remove them later on)
Upvotes: 0