Fuzzy
Fuzzy

Reputation: 3810

how to search for a given sequence of rows within a table in SQL Server 2008

The problem:

We have a number of entries within a table but we are only interested in the ones that appear in a given sequence. For example we are looking for three specific "GFTitle" entries ('Pearson Grafton','Woolworths (P and O)','QRX - Brisbane'), however they have to appear in a particular order to be considered a valid route. (See image below)

RowNum  GFTitle
------------------------------
   1    Pearson Grafton
   2    Woolworths (P and O)
   3    QRX - Brisbane
   4    Pearson Grafton
   5    Woolworths (P and O)
   6    Pearson Grafton
   7    QRX - Brisbane
   8    Pearson Grafton
   9    Pearson Grafton

So rows (1,2,3) satisfy this rule but rows (4,5,6) don't even though the first two entries (4,5) do.

I am sure there is a way to do this via CTE's but some help would be great.

Cheers

Upvotes: 1

Views: 147

Answers (2)

Andriy M
Andriy M

Reputation: 77677

Assuming RowNum has neither duplicates nor gaps, you could try the following method.

  1. Assign row numbers to the sought sequence's items and join the row set to your table on GFTitle.

  2. For every match, calculate the difference between your table's row number and that of the sequence. If there's a matching sequence in your table, the corresponding rows' RowNum differences will be identical.

  3. Count the rows per difference and return only those where the count matches the number of sequence items.

Here's a query that implements the above logic:

WITH SoughtSequence AS (
  SELECT *
  FROM (
    VALUES
      (1, 'Pearson Grafton'),
      (2, 'Woolworths (P and O)'),
      (3, 'QRX - Brisbane')
  ) x (RowNum, GFTitle)
)
, joined AS (
  SELECT
    t.*,
    SequenceLength = COUNT(*) OVER (PARTITION BY t.RowNum - ss.RowNum)
  FROM atable t
  INNER JOIN SoughtSequence ss
  ON t.GFTitle = ss.GFTitle
)
SELECT
  RowNum,
  GFTitle
FROM joined
WHERE SequenceLength = (SELECT COUNT(*) FROM SoughtSequence)
;

You can try it at SQL Fiddle too.

Upvotes: 0

Serg
Serg

Reputation: 2427

This is very simple using even good old tools :-) Try this quick-and-dirty solution, assuming your table name is GFTitles and RowNumber values are sequential:

SELECT a.[RowNum]
      ,a.[GFTitle]
      ,b.[GFTitle]
      ,c.[GFTitle]
 FROM [dbo].[GFTitles] as a
      join [dbo].[GFTitles] as b on b.RowNumber = a.RowNumber + 1
      join [dbo].[GFTitles] as c on c.RowNumber = a.RowNumber + 2
WHERE a.[GFTitle] = 'Pearson Grafton' and
      b.[GFTitle] = 'Woolworths (P and O)' and
      c.[GFTitle] = 'QRX - Brisbane'

Upvotes: 1

Related Questions