Reputation: 31
I'm sure I'm not understanding this, but this is defined as
PATTERN: Specifying the Pattern to Match PATTERN ( ) The pattern defines a valid sequence of rows that represents a match. The pattern is defined like a regular expression (regex) and is built from symbols, operators, and quantifiers.
An example was:
For example, suppose that symbol S1 is defined as stock_price < 55, and symbol S2 is defined as stock price > 55. The following pattern specifies a sequence of rows in which the stock price increased from less than 55 to greater than 55:
PATTERN (S1 S2)
So if I do
create or replace table names (id int, name varchar (500), groupid int);
insert into names
select 1, 'andrew', 1
union
select 2, 'andrew2', 1
union
select 3, '3andrew', 1
And then I do
select * from names
match_recognize(
partition by groupid order by id
measures
classifier() as "classifier"
all rows per match
pattern (test test2)
define test as startswith(name, 'and'),
test2 as endswith(name, 'rew')
) t
;
Why do I not get 'andrew' as a record return? If I put either test in the pattern, it does show it. When I put both in, it does not. Instead it is showing 3andrew and andrew2 as the record result which is unexpected for me because the example lead me to believe it works like an AND. Any help is appreciated.
Upvotes: 1
Views: 750
Reputation: 175556
The pattern is (test test2)
which means find exactly one row starting with And
followed by exactly one that has rew
suffix. The pattern window has size 2.
The rows are processed as follow:
1 andrew
2 andrew2
3 3andrew
Pass 1:
1 andrew
2 andrew2 -- fails it is not test2
Pass 2(row 1 was skipped at this point):
2 andrew2 -- test pass
3 3andrew -- test2 pass
Return rows 2 and 3 and start the search process again(if they are still rows to process).
If you want arbitrary number of test, then pattern should be (test+ test2)
- (window size is at least 2).
If you want element to be one of the two then pattern should be (test|test2)
- (window size is 1).
Recommended reading: match_recognize — Regular Expressions Over Rows
Upvotes: 2
Reputation: 25903
so your pattern pattern (test test2)
as I understand it is say find a row where test
passes and latter row where test2
passes.
MATCH_RECOGNIZE says of the operator
(space) which you have between your two rules:
Concatenation. Specifies that a symbol or operation should follow another one. For example, S1 S2 means that the condition defined for S2 should occur after the condition defined for S1.
But I suspect your question is why does row id 1 not match on test
and then row 3 match on test2
.. and I believe it is the AFTER MATCH SKIP
the defaults to PAST LAST ROW
which find row 1 & 2 match test, so it starts after the last, thus you only get one match. Not that I could use that section to turn off this behavior.
If I change the input data a little:
WITH names(id, name, groupid) as (
SELECT * FROM VALUES
(10, 'andrew', 1),
(20, '2andrew', 1),
(30, 'andrew3', 1),
(40, 'simeon', 1),
(50, '4andrew', 1)
)
this gives:
ID NAME GROUPID classifier
10 andrew 1 TEST
20 2andrew 1 TEST2
So row 10 can work, but it seems in your data 2 binds closer to 3, but in my data 30 does not match 50. Also of note is you have no order by clause in you match, which means you might get non-deterministic results. Unless you have re-sorted you data elsewhere..
thus change adding a partition
and order
clause means both andrews now trigger:
WITH names(id, name, groupid) as (
SELECT * FROM VALUES
(10, 'andrew', 1),
(11, 'simeon', 2),
(20, '2andrew', 1),
(30, 'andrew3', 1),
(60, 'simeon', 1),
(50, '4andrew', 1)
)
select *
from names
match_recognize(
partition by groupid
order by id
measures
classifier() as "classifier"
all rows per match
pattern (test test2)
define test as startswith(name, 'and'),
test2 as endswith(name, 'rew')
) t
;
Upvotes: 0