Jan Möller
Jan Möller

Reputation: 321

Distinct duplicate rows based on one property

I have the following data:

ID| TimeStamp| Data
-------------------
 1| t+1      |A
 2| t+2      |B
 3| t+3      |A
 4| t+4      |A
 5| t+5      |D

I want to get all unique rows, based on Data. If there are some values next to each other, i want to get the newest one (TimeStamp).

My desired result:

ID| TimeStamp| Data
-------------------
 1| t+1      |A
 2| t+2      |B
 4| t+4      |A
 5| t+5      |D

Row with ID 3 should be filtered (t+4 > t+3) but ID 1 should remain (because there is Data B between).

How would a query for this look like in sql/linq?

Upvotes: 0

Views: 56

Answers (3)

raznagul
raznagul

Reputation: 375

Here is a solution with Linq:

class Row
{
    public int Id { get; set; }
    public DateTime Timestamp { get; set; }
    public string Data { get; set; }
}

var testData = new List<Row>
{
    new Row { Id = 1, Timestamp = DateTime.Now.AddHours(-5), Data = "A" },
    new Row { Id = 2, Timestamp = DateTime.Now.AddHours(-4), Data = "B" },
    new Row { Id = 3, Timestamp = DateTime.Now.AddHours(-3), Data = "A" },
    new Row { Id = 4, Timestamp = DateTime.Now.AddHours(-1), Data = "A" },
    new Row { Id = 5, Timestamp = DateTime.Now.AddHours(-2), Data = "A" },
    new Row { Id = 6, Timestamp = DateTime.Now.AddHours(-0), Data = "D" },
};

var orderedData = testData.OrderBy(row => row.Timestamp); //The list needs to be in correct order for the where statement to work.

var filteredResult = 
    orderedData.Where((row, i) => 
        i + 1 == testData.Count || row.Data != testData[i + 1].Data); //Current row is last row OR Data of current row is different from next row.

foreach (var row in filteredResult)
{
    Console.WriteLine($"ID: {row.Id}, Timestamp: {row.Timestamp}, Data: {row.Data}");
}

Upvotes: 0

kiran gadhe
kiran gadhe

Reputation: 743

    SELECT *
    FROM your_table YT1
    WHERE NOT EXISTS (
    SELECT *
    FROM your_table YT2 yt2.data = yt1.data
        AND yt1.id = yt2.id + 1
    )

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270081

You can do this with lead():

select id, timestamp, data
from (select t.*, lead(data) over (order by timestamp) as next_data
      from t
     ) t
where next_data <> data or next_data is null;

Upvotes: 1

Related Questions