Reputation: 2435
my rows looks like this:
1
0 ----> Change! This row is of interest
1 ----> Change again.
1
1
1
1
1
0 ----> Change.
1 ----> Change.
There can be a million zeros before a new 1 and I only want the changes (the rows marked with Change). This would bring me a result with about 10 million rows less. We support SQLServer and PostGresSQL. It's ordered by a timestamp column. A 0 is system offline flag and 1 system online. A service reports this information at intervals and timestamps it.
Any ideas? Edit: There's plenty of other columns one is a timestamp column that determines the order. A 0 is system offline flag and 1 system online. A service reports this information at intervals and timestamps it.
Cheers
Upvotes: 3
Views: 6202
Reputation: 6094
Option #1: Using MS SQL SERVER 2008
Ordering with the timestamp, we can use the rank()
function and a temporary table. A CTE and a table variable can also be used. Performance is a tricky part, so I would suggest testing the three options if this is going to be repeated in the future. I'll show two examples:
TEMPORARY TABLE (try it in SQLFiddle):
select rank() OVER (ORDER BY order_timestamp) as 'Rank', status into temp1 from temp
select t1.status as status, case when t1.status - t2.status = 0 then 'not changed' else 'changed' end as changed
from temp1 t1, temp1 t2
where t1.Rank = t2.Rank + 1
drop table temp1
CTE (try it in SQLFiddle):
with CTE_temp as (
select rank() OVER (ORDER BY order_timestamp) as 'Rank', *
from temp
)
select t1.status as status, case when t1.status - t2.status = 0 then 'not changed' else 'changed' end as changed
from CTE_temp t1, CTE_temp t2
where t1.Rank = t2.Rank + 1
Option #2: Using MS SQL SERVER 2012
MS SQL SERVER 2012 introduced lead
and lag
(http://blog.sqlauthority.com/2011/11/15/sql-server-introduction-to-lead-and-lag-analytic-functions-introduced-in-sql-server-2012/).
In this case, option #1 is still valid, but you can also try @RomanPekar's solution.
UPDATE:
Based on @RomanPekar's comment (and someone's downvote), I have to say that a temporary table can perfectly perform better than a CTE and a table variable, especially when a large set of data is expected. The optimizer can use statistics from the temporary table to establish its query plan, and this can result in performance gains.
Likewise, depending on the use the OP wants to give the data afterwards (maybe more queries), the temporary table is still there, no new query has to be executed, and indexes can be used to improve performance in those cases.
BTW, hacking my answer and converting it to a CTE or a table variable is easy, so I would suggest the OP to test performance for the three cases if this is an operation he will repeat in the future.
Upvotes: -1
Reputation: 117345
we have to know how to find previous row, but the general solution would be (I'm supposing that your rows have column Date and it's unique)
select *
from temp as t1
outer apply
(
select top 1 *
from temp as t2
where t2.Date < t1.Date /* or you columns */
order by t2.Date desc /* or you columns */
)
where t2.value <> t1.value
EDIT: As I'm usually work on 2008 R2 SQL Server, I forgot about LAG and LEAD functions. So, based on @depesz answer for PostgreSQL, here's SQL Server version:
with CTE_temp as (
select
*,
lag([State]) over (order by [Time]) as State_Previous
from temp
)
select *
from CTE_temp
where State_Previous <> [State]
Upvotes: 0
Reputation:
OK. So based on a comment we know there is timestamp column. Let's assume this is named "event_when", and the 0/1 column is named "status".
So, we can:
with x as (
select
*,
lag(status) over (order by event_when) is distinct from status as interesting
from table
)
select * from x where interesting;
Upvotes: 6
Reputation: 33963
If you have ID's, and they are sequential, you can try something like:
SELECT table1.* FROM table table1, table table2
WHERE table1.id = table2.id- 1
AND table1.value != table2.value
It's a little hard to say without seeing the rest of your structure, but in the above case, value
is the column that contains the 0
or 1
and id
is the primary key. If you don't have an id column, or they are not incremental, then you may need to specify a more complex selector, or include more of your schema here.
Upvotes: 2