A User
A User

Reputation: 862

Postgresql: Dynamic Regex Pattern

I have event data that looks like this:

 id | instance_id | value
 1  | 1           | a
 2  | 1           | ap
 3  | 1           | app
 4  | 1           | appl
 5  | 2           | b
 6  | 2           | bo
 7  | 1           | apple
 8  | 2           | boa
 9  | 2           | boat
10  | 2           | boa
11  | 1           | appl
12  | 1           | apply

Basically, each row is a user typing a new letter. They can also delete letters.

I'd like to create a dataset that looks like this, let's call it data

 id | instance_id | value
 7  | 1           | apple
 9  | 2           | boat
12  | 1           | apply

My goal is to extract all the complete words in each instance, accounting for deletion as well - so it's not sufficient to just get the longest word or the most recently typed.

To do so, I was planning to do a regex operation like so:

select * from data
where not exists (select * from data d2 where d2.value ~ (d.value || '.'))

Effectively I'm trying to build a dynamic regex that adds matches one character more than is present, and is specific to the row it's matching against.

The code above doesn't seem to work. In Python, I can "compile" a regex pattern before I use it. What is the equivalent in PostgreSQL to dynamically build a pattern?

Upvotes: 3

Views: 691

Answers (2)

Abelisto
Abelisto

Reputation: 15624

To find peaks in the sequential data window functions is a good choice. You just need to compare each value with previous and next ones using lag() and lead() functions:

with cte as (
  select 
    *, 
    length(value) > coalesce(length(lead(value) over (partition by instance_id order by id)),0) and
    length(value) > coalesce(length(lag(value) over (partition by instance_id order by id)),length(value)) as is_peak
  from data)
select * from cte where is_peak order by id;

Demo

Upvotes: 1

krokodilko
krokodilko

Reputation: 36137

Try simple LIKE operator instead of regex patterns:

SELECT * FROM data d1
WHERE NOT EXISTS (
  SELECT * FROM data d2
  WHERE d2.value LIKE d1.value ||'_%'
)

Demo: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cd064c92565639576ff456dbe0cd5f39

Create an index on value column, this should speed up the query a bit.

Upvotes: 1

Related Questions