Rogman
Rogman

Reputation: 31

How to compare rows in source and destination tables dynamically in SQL Server

We receive a data feed from our customers and we get roughly the same schema each time, though it can change on the customer end as they are using a 3rd party application. When we receive the data files we import the data into a staging database with a table for each data file (students, attendance, etc). We then want to compare that data to the data that we already have existing in the database for that customer and see what data has changed (either the column has changed or the whole row was possibly deleted) from the previous run. We then want to write the updated values or deleted rows to an audit table so we can then go back to see what data changed from the previous data import. We don't want to update the data itself, we only want to record what's different between the two datasets. We will then delete all the data from the customer database and import the data exactly as is from the new data files without changing it(this directive has been handed down and cannot change). The big problem is that I need to do this dynamically since I don't know exactly what schema I'm going to be getting from our customers since they can make customization to their tables. I need to be able to dynamically determine what tables there are in the destination, and their structure, and then look at the source and compare the values to see what has changed in the data.

Additional info: There are no ID columns on source, though there are several columns that can be used as a surrogate key that would make up a distinct row.

I'd like to be able to do this generically for each table without having to hard-code values in, though I might have to do that for the surrogate keys for each table in a separate reference table.

I can use either SSIS, SPs, triggers, etc., whichever would make more sense. I've looked at all, including tablediff, and none seem to have everything I need or the logic starts to get extremely complex once I get into them.

Of course any specific examples anyone has of something like this they have already done would be greatly appreciated.

Let me know if there's any other information that would be helpful.

Thanks

Upvotes: 3

Views: 4392

Answers (2)

Jeroen Bolle
Jeroen Bolle

Reputation: 1836

I've worked on a similar problem and used a series of meta data tables to dynamically compare datasets. These meta data tables described which datasets need to be staged and which combination of columns (and their data types) serve as business key for each table.

This way you can dynamically construct a SQL query (e.g., with a SSIS script component) that performs a full outer join to find the differences between the two.

You can join your own meta data with SQL Server's meta data (using sys.* or INFORMATION_SCHEMA.*) to detect if the columns still exist in the source and the data types are as you anticipated.

Redirect unmatched meta data to an error flow for evaluation.

This way of working is very risky, but can be done if you maintain your meta data well.

Upvotes: 2

jerry
jerry

Reputation: 1857

If you want to compare two tables to see what is different the keyword is 'except'


select col1,col2,... from table1
except
select col1,col2,... from table2

this gives you everything in table1 that is not in table2.


select col1,col2,... from table2
except
select col1,col2,... from table1

this gives you everything in table2 that is not in table1.


Assuming you have some kind of useful durable primary key on the two tables, everything in both sets, is a change. Everything in the first set is an insert; Everything in the second set is a delete.

Upvotes: 1

Related Questions