Alvin Chin
Alvin Chin

Reputation: 61

How to iterate through the result of a SELECT query to find a row pattern using U-SQL

I have the result of a SELECT query in a variable, and now I want to iterate through the query result row by row to do some processing, like finding a particular pattern. For example, the pattern could be the following:

a, b, c, d, e
b, c, d, e, f
c, d, e, f, g

And the result of the SELECT query in CSV could be:

1, 2, 3, 4, 5
3, 4, 5, 6, 7
a, b, c, d, e
b, c, d, e, f
c, d, e, f, g
5, 6, 7, 8, 9

I've seen something about a PROCESS statement using a custom extractor, but is this the way to do it? I am not sure how this PROCESS and extractor works.

https://msdn.microsoft.com/en-us/library/azure/mt621322.aspx

Thanks for any help.

Upvotes: 1

Views: 153

Answers (1)

wBob
wBob

Reputation: 14389

I don't think you need to iterate. Would a more set-based approach work for you? Try this sample U-SQL script I created. Basically if there is a match the result will be in the file, if there is no match the file will be empty.

// Set the search pattern
@pattern = 
    SELECT *
    FROM ( VALUES 
            ( "a", "b", "c", "d", "e" ),
            ( "b", "c", "d", "e", "f" ),
            ( "c", "d", "e", "f", "g" )
           ) AS t (col1, col2, col3, col4, col5 );


// Get the file to search
@input =
    EXTRACT col1 string,
            col2 string,
            col3 string,
            col4 string,
            col5 string
    FROM "/input/input.csv"
    USING Extractors.Csv();


// Add rowIds
@pattern =
    SELECT ROW_NUMBER() OVER() AS rowId, *
    FROM @pattern;

@input =
    SELECT ROW_NUMBER() OVER() AS rowId, *
    FROM @input;


// Check the same rows appear in the same order
@temp =
    SELECT i.rowId,
           p.rowId == null ? 0 : ROW_NUMBER() OVER( ORDER BY p.rowId ) AS rowId2    // Restarts the row numbering when there is a match
    FROM @input AS i
         LEFT OUTER JOIN
             @pattern AS p
         ON i.col1 == p.col1
            AND i.col2 == p.col2
            AND i.col3 == p.col3
            AND i.col4 == p.col4
            AND i.col5 == p.col5;


@output =
    SELECT p.*
    FROM @pattern AS p
         INNER JOIN
             @temp AS t
         ON p.rowId == t.rowId2;


@pattenRecords =
    SELECT COUNT( * ) AS records
    FROM @pattern;


@records =
    SELECT COUNT( * ) AS records
    FROM @output;


// Join criteria mean output file will be empty if there has not been a match
@output =
    SELECT o.*
    FROM @output AS o
       CROSS JOIN @records AS r
       INNER JOIN
          (
          SELECT *
          FROM @pattenRecords
          INTERSECT
          SELECT *
          FROM @records
          ) AS t ON r.records == t.records;



// Output results
OUTPUT @output
    TO "/output/output.csv"
USING Outputters.Csv();

Maybe there is a simpler way.

Upvotes: 1

Related Questions