Reputation: 5932
I have a process which populates an Oracle table which has over 100 million rows. The table structure is as follows
**ORACLE_TABLE**
id|contractdatetime|Attr3|Attr4|Attr5
The combination of (id,contractdatetime)
is unique in this table, which gets populated using an external process.
The total distinct id
is approx 30000 only.
For every id there is a unique contractdatetime. The id isn't unique, but the combination of (id,contractdatetime)
is
Now another process populates an identical table in SQL Server
**SQLSERVER_TABLE**
id|contractdatetime|Attr3|Attr4|Attr5
I am thinking about the best way to check if the data is both tables is identical.
I thought if I can get a hashed version by contractid
and somehow aggregate all the other attributes in Oracle. And if I can do the same in SQL Server, I would be able to compare this in Excel itself (30000) rows.
I have searched StackOverflow and couldn't get an identical function for MD5_XOR or anything which can help achieve this as per the link below. http://www.db-nemec.com/MD5/CompareTablesUsingMD5Hash.html
The other options of using a linked server etc would take lot more difficulty in getting approvals for doing this.
Is there a good way to go about with this?
Upvotes: 2
Views: 1658
Reputation: 36922
For a fast, high-level comparison between an Oracle and a SQL Server table you can use aggregations of the functions STANDARD_HASH
and HASH_BYTES
.
--Create a simple table.
create table table1
(
id number,
contractdatetime date,
Attr3 varchar2(100),
Attr4 varchar2(100),
Attr5 varchar2(100)
);
--Insert 4 rows, the first three will be identical between databases,
--the last row will be different.
insert into table1 values (1, date '2000-01-01', 'a', 'a', 'a');
insert into table1 values (2, date '2000-01-01', 'b', 'b', 'b');
insert into table1 values (2, date '2000-01-02', null, null, null);
insert into table1 values (3, date '2000-01-02', 'Oracle', 'Oracle', 'Oracle');
commit;
select
id,
--Format the number
trim(to_number(
--Sum per group.
sum(
--Convert to a number.
to_number(
--Get the first 14 bytes. This seems to be the maximum that SQL Server can handle
--before it runs into math errors.
substr(
--Hash the value.
standard_hash(
--Concatenate the values using (hopefully) unique strings to separate the
--columns and represent NULLs (because the hashing functions treat nulls differently.)
nvl(to_char(contractdatetime, 'YYYY-MM-DD HH24:MI:SS'), 'null') ||
'-1-' || nvl(attr3, 'null') || '-2-' || nvl(attr3, 'null') || '-3-' || nvl(attr3, 'null')
, 'MD5')
, 1, 14)
, 'xxxxxxxxxxxxxxxxxxxx'))
, '99999999999999999999')) hash
from table1
group by id
order by 1;
create table table1
(
id numeric,
contractdatetime datetime,
Attr3 varchar(100),
Attr4 varchar(100),
Attr5 varchar(100)
);
insert into table1 values (1, cast('2000-01-01 00:00:00.000' as datetime), 'a', 'a', 'a');
insert into table1 values (2, cast('2000-01-01 00:00:00.000' as datetime), 'b', 'b', 'b');
insert into table1 values (2, cast('2000-01-02 00:00:00.000' as datetime), null, null, null);
insert into table1 values (3, cast('2000-01-02 00:00:00.000' as datetime), 'SQL Server', 'SQL Server', 'SQL Server');
commit;
select
id,
sum(
convert(bigint, convert(varbinary,
substring(
hashbytes('MD5',
isnull(convert(varchar(19), contractdatetime, 20), 'null') +
'-1-' + isnull(attr3, 'null') + '-2-' + isnull(attr3, 'null') + '-3-' + isnull(attr3, 'null'))
, 1, 7)
, 1))) hash
from table1
group by id
order by 1;
As expected, the hashes for the first two groups are identical, and the hash for the third group is different.
Oracle:
ID HASH
1 50696302970576522
2 69171702324546493
3 50787287321473273
SQL Server
ID HASH
1 50696302970576522
2 69171702324546493
3 7440319042693061
Here is an Oracle fiddle and a SQL Server fiddle.
Upvotes: 4