d2907
d2907

Reputation: 902

SSIS - How to improve a work flow

Previously I have asked for a possible solution for a situation that I had to face in order to implement a sql query (which is implementing originally in access). I have reach a solution (after asking a lot) but I would like to know if anyone has another way to execute this query.

I have got two different tables, one in sql and another in oracle (S and O)

O(A, B, C) => PK=(A,B) and S(D,E,F) => PK = (D,E)

The query looks like this

SELECT A,B,C,E,F
FROM S INNER JOIN O ON
S.D = O.A      (Only one attribute of the PK in O)

S has over 10.000 registers and O more than 700 millions. Given this, is not logic to implement a merge join, or a look up because I will have only the first match between D and A. So I thought that it will be better to assemble the query in the Oracle side. To do this I have implemented an scheme like this.

enter image description here

With the sql I have executed this query:

with tmp(A) as ( select distinct D as A from S
)
select cast( select concat(' or A = ', A) 
             from tmp 
             for xml path('')) as nvarchar(max)) as ID

I am getting a string with the values that I gonna search on oracle. Finally in the data flow, I am creating an expression like this:

select A, B, C
from O
where A= '' + @ID

I downnload this values to sql server and then I am able to manipulate them as I wish.

The use of the foreach loop was necessary because I am storing the string of sql inside an object variable. I found that SSIS has some troubles with the nvarchar(max) variables.

Some considerations:

1) The Oracle database is administered for another area of the company and they only gives reading permissions over the tables.

2) The DBA of the sql server does not allow to download the O table on a staging area. Not possibilities of negotiations with him, besides, this tabla is updated every day with more registers. He only manages this server and does not have any authority over Oracle.

3) The solution that was given for some members of my team was to create a query in oracle between different tables that can give me the attributes of O that I need, as a result I could get more than 3 millions of register and not all of the attributes A are presented in S. Even more, some the values of D has been manipulated, so possibly they are not going to be present in O.

With this implementation I am getting more than 150.000 registers from Oracle. But I would like to know if another solution can be implemented or if there are other components that I can use to reach the same results. Believe me when I say that I have read, asked and searched a lot before to implement this flow.

Upvotes: 0

Views: 91

Answers (1)

acesargl
acesargl

Reputation: 569

EDITED:

Option 1 (You say that you cannot use this solution – but it would be the first one – the best)

Use a DBLink to let Oracle access S table (you must use Oracle Database Gateway). Create a view in Oracle joining O and S. And finally use linked server to let SQL Server access the Oracle Joining view and get the results.

The process is as follow:

Option 2 (You say that you cannot use this solution – but it would be the second one)

As your Oracle admins seem to be monsters that will kill you if they get their paws on you. Then you can try (if they let you create a table in oracle):

  • Create a linked server in SQL Server (to access Oracle from SQL Server). As I mentioned in the "normal case".
  • And Create a (temporary) table in Oracle schema with only 1 column (it will store D values from SQL Server)

    Everytime you need to evaluate your query execute in SQL Server:

    INSERT INTO ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE SELECT DISTINCT D FROM S;

    SELECT * FROM OPENQUERY('SELECT * FROM ORACLE_OWNER.O WHERE A IN (SELECT D FROM ORACLE_OWNER.TEMP_TABLE)');

    And finally don't forget to delete the Oracle's temp table:

    DELETE * FROM ORACLE_LINKED_SERVER.ORACLE_OWNER.TEMP_TABLE;

Option 3 (If you have an Oracle license and one available host)

You can install your own Oracle server in your host and use Option 2.

Option 4

If your solution is really the only way out, then let's try to improve it a little bit.

As you know, your solution works but it is a little bit aggressive (you are transforming a relational algebra semijoin operator into a relational algebra selection operator with a monster condition). You say that the Oracle table is updated everyday with more register, but if the update rate of your tables are lower than your query rate then you can create a result cache that you can use while the tables S or O are not changed.

Proceed as follows:

Create a table in your SQL Server to store the Oracle result of your monster query. And before build and launch your query execute this:

SELECT last_user_update
  FROM sys.dm_db_index_usage_stats
  WHERE database_id = DB_ID( 'YourDatabaseName')
        AND OBJECT_ID=OBJECT_ID('S')

This returns the most recent time when your table S was update. Store this value in a table (create a new table or store this value in a typical parameter table).

Create your monster query. But before launch it, send this query to Oracle:

SELECT MAX(ORA_ROWSCN) 
  FROM O;

It returns the last SCN (System Change Number) that cause a change in the table. Store this value in a table (create a new table or store this value in a typical parameter table).

Launch the big query and store its result into the cache table.

Finally, when you need to repeat the big query, first execute in your SQL Server:

SELECT last_user_update
  FROM sys.dm_db_index_usage_stats
  WHERE database_id = DB_ID( 'YourDatabaseName')
        AND OBJECT_ID=OBJECT_ID('S')

And execute in Oracle:

SELECT MAX(ORA_ROWSCN) 
  FROM O;

If one or both values have changed with respect the one you have stored in your parameter table, then you must store them in the parameters table (updating the old values) and launch again the big query. But if none of the values have changed, then your cache is up to date, and you can use it.

Note that

Upvotes: 2

Related Questions