hardikvd
hardikvd

Reputation: 34

SSIS Package - How to get nth record from recordset of 2000?

I have SSIS package with Source as SalesForce Object (through Pragmatic Task Factory). From SalesForce, I get a recordset of 2000 records. I want to get specific row number record (1st, 401th, 801th, 1201th & 1601th) from this recordset.

I have one way of doing this by adding all records in a temporary table and fire select query on temporary table. But I don't want to use that way as it is slowing down the process. I have to process total 5-7 million records.

Is there any way by which I can get nth record from record set of 2000 within SSIS package only? (through script task or any other way?)

Thank you so much for your help in advance.

Upvotes: 1

Views: 910

Answers (4)

Matt
Matt

Reputation: 14361

Ultimately I have no idea why you would always want only the the 400th record but I don't know the business case..... If you really are just trying to get a sampling of the data use ROW SAMPLING transformation and set how many records you want.

If you have a sequential numeric identity with no gaps in your source you could use a multi cast and perform an aggregations to determine, max, min, and count then use those with a conditional split to cut up the source. That's not exactly full proof though and gaps would definitely break the logic.

Anyway, the best way I know how to do it exactly as you describe would be through a script component [SC] in your Data Flow [DF]. The gist of the solutions is:

  • Add Script Component as a Transformation
  • Connect Source to Transformation Script Component
  • Add A New Output Column in SC
  • Edit script To Add Row Number code
  • Add Conditional Split
  • Do whatever you want to with the data.

I will focus on the script component.

Add the component and go to "Inputs and Outputs" section to define an output. In this example I will use RowNumber and Flag. The reason I will put both is to show you how to generate the actual number, but also how to just flag the record to make the logic easier in the conditional split.

enter image description here

Now go back to the "Script" section and choose "Edit Script" button at bottom of the page. Scroll through until you see public override void PreExecute(). This is the entry/starting point of the script.

For RowNumber, define a private variable to hold the current row number, I like to do this above the PreExecute(). In the PreExecute() method set the value of that variable to 0. Then at the Input0_ProcessInputRow function add the logic to determine if row number is 1, 401, etc. and set the Flag Output column value to true or false. Also increment the variable and assign it to the RowNumber Output Column.

enter image description here

So the code you would/modify add would be:

private int _rowNumber;
     public override void PreExecute()
    {
        base.PreExecute();
        _rowNumber = 0;
    }

    public override void Input0_ProcessInputRow(Input0Buffer Row)
    {
        /* setting flag before the increment so that it is a 0 based operation */
        if (_rowNumber == 0 || (_rowNumber % 400) == 0)
        {
            Row.Flag = true;
        }
        else
        {
            Row.Flag = false;
        }

        /* increment and assign the row number */
        _rowNumber++;
        Row.RowNumber = _rowNumber;
    }

Now just add that Conditional Split you should be good to go. The good news about the conditional split is because the Flag Column is never null and it is a Boolean your condition is literally just the column name Flag

enter image description here

So in the end your Data Flow will look something like this:

enter image description here

Note I tested this with 5,000 test records and it worked perfectly. Hope it helps you.

Upvotes: 1

user2460074
user2460074

Reputation: 1272

In your source do a query like as bellow :

with cte as (
select *, row_number() over(order by put_name_of_colum) as [RN]
    from your_table
    )
    select*from cte where RN in(1,401,801,1201,1601)

Upvotes: 0

Roger Wolf
Roger Wolf

Reputation: 7712

Depending on your circumstances, one of the following approaches might work.

The first way is to select only the rows you need in the first place. The query might look like this:

select sq.*
from (
    select t.*, row_number() over(order by <Put your sort order here>) as [RN]
    from dbo.Table t
    ) sq
where sq.RN % 400 = 1;

Another way might be to add a column in SSIS and number it incrementally - such as described here, for example. A simple Conditional Split placed right after it will do the trick.

Upvotes: 0

vercelli
vercelli

Reputation: 4767

You may add row_number() to your source query. Then do a conditional split asking for that row_number%2000 == 0.

Upvotes: 0

Related Questions