Eugene Lisitsky
Eugene Lisitsky

Reputation: 12845

Select from table removing similar rows - PostgreSQL

There is a table with document revisions and authors. Looks like this:

doc_id    rev_id   rev_date            editor    title,content so on....
123        1       2016-01-01 03:20    Bill        ......
123        2       2016-01-01 03:40    Bill
123        3       2016-01-01 03:50    Bill
123        4       2016-01-01 04:10    Bill
123        5       2016-01-01 08:40    Alice
123        6       2016-01-01 08:41    Alice
123        7       2016-01-01 09:00    Bill
123        8       2016-01-01 10:40    Cate
942        9       2016-01-01 11:10    Alice
942       10       2016-01-01 11:15    Bill
942       15       2016-01-01 11:17    Bill
  

I need to find out moments when document was transferred to another editor - only first rows of every edition series.

Like so:

doc_id    rev_id   rev_date            editor    title,content so on....
123        1       2016-01-01 03:20    Bill        ......
123        5       2016-01-01 08:40    Alice
123        7       2016-01-01 09:00    Bill
123        8       2016-01-01 10:40    Cate
942        9       2016-01-01 11:10    Alice
942       10       2016-01-01 11:15    Bill
  

If I use DISTINCT ON (doc_id, editor) it resorts a table and I see only one per doc and editor, that is incorrect. Of course I can dump all and filter with shell tools like awk | sort | uniq. But it is not good for big tables.

Window functions like FIRST_ROW do not give much, because I cannot partition by doc_id, editor not to mess all them.

How to do better?

Thank you.

Upvotes: 0

Views: 28

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269513

You can use lag() to get the previous value, and then a simple comparison:

select t.*
from (select t.*,
             lag(editor) over (partition by doc_id order by rev_date) as prev_editor
      from t
     ) t
where prev_editor is null or prev_editor <> editor;

Upvotes: 1

Related Questions