SQL query to find difference in text between different rows

Question

I have a table that resembles the following:

                     INITIAL TABLE
-------------------------------------------------------

ID        Date           Author      Data1       Data9
==  ================   ==========   ======= ... =======
1   2017-06-04 16:47   John Smith     Foo         Bar
2   2017-06-04 16:50   John Smith     Goo         Bar
3   2017-06-04 16:52   John Smith     Hoo         Car
4   2017-06-04 16:55   Bill Plith     Foo         Bar
5   2017-06-04 16:59   John Smith     Foo         Car
6   2017-06-04 17:04   Bill Plith     Foo         Bar

The data comes from users changing a data set from a central source. The user's name, the datetime that the change was made, as well as a flush of all data fields' values are appended to the table whenever a change is made to any data field.

The problem is that it's tedious to manually check over each column to see where the actual change was made. Sometimes a change is made to one field, sometimes a change is made to multiple fields, and sometimes no changes are made at all (if the user clicked "Save Changes" with no changes actually made, a new row will still get added to the table).

What I want to do is query the table to produce results similar to the following:

                     CHANGE TABLE
--------------------------------------------------------

      Date           Author              Changes
================   ==========   ========================
2017-06-04 16:50   John Smith   Data1 was changed to Goo
2017-06-04 16:52   John Smith   Data1 was changed to Hoo
2017-06-04 16:52   John Smith   Data9 was changed to Car
2017-06-04 16:55   Bill Plith   Data1 was changed to Foo
2017-06-04 16:55   Bill Plith   Data9 was changed to Bar
2017-06-04 16:59   John Smith   Data9 was changed to Car
2017-06-04 17:04   Bill Plith   Data9 was changed to Bar

Some more things to note:

If no changes were made from Data1-9 in the Initial Table between a current row and its previous row, then no row should be appended to the Change Table.
If changes with the same Date(and time) and Author can be consolidated to a comma delimited list in the Changes column, that would be ideal too.
ID from the initial table will be ordered, but some IDs will be missing. For example, the IDs may be 1, 2, 3, 4, 7, 8, 10 and so on. Values can be missing, but the ordering on the initial table should still be based on the ID (rather than say, the Date).

I think that's everything. Thank you in advance to anyone who can assist, I'm still in the process of learning SQL so if I left out anything pertinent please do leave a comment and I'll fill in the gaps!

Gordon Linoff · Accepted Answer

You can use a giant case expression with lag(). Assuming none of the values are ever NULL:

select c.date, c.author,
       stuff( ((case when data1 <> lag(data1) over (partition by date) then ', data1' else '' end) +
               (case when data2 <> lag(data2) over (partition by date) then ', data2' else '' end) +
               . . .
              ), 1, 2, '') as changes
from changes c;

This can be modified to handle NULL values, although that complicates the expressions a bit.

SQL query to find difference in text between different rows

Answers (2)

Related Questions