thebigpeeler
thebigpeeler

Reputation: 95

SQLite delete rows based on multiple columns

im pretty new to SQLite hence asking this question!

I need to remove rows in a table so that I have the earliest occurence of column each unique value in column X(colour) based on column Y (time).

Basically i have this:

test | colour | time(s)
one | Yellow | 8
one | Red | 6
one | Yellow | 10
two | Red | 4

Which i want to remove rows so that is looks like:

test | colour | time(s)
one | Yellow | 8
two | Red | 4

Thanks in advance!

EDIT: To be clearer i need to retain the Earliest occurence in time that each colour occurred, regardless of the test.

EDIT: I can select the rows i want to keep by doing this:

select * from ( select * from COL_TABLE order by time desc) x group by colour;

which produces the desired result, but i want to remove what is not there in the result of the select.

EDIT: The following worked thanks to @JimmyB:

   DELETE FROM COL_TABLE WHERE EXISTS ( SELECT * FROM COL_TABLE t2 WHERE     COL_TABLE .colour = t2.colour AND COL_TABLE .test = t2.test AND COL_TABLE .time < t2.time )

Upvotes: 2

Views: 2789

Answers (1)

JimmyB
JimmyB

Reputation: 12630

You can include subqueries (EXISTS/NOT EXISTS) in the WHERE clause of a DELETE statement.

Like subqueries in SELECTs, these can refer to the table in the outer statement to create matches.

In your case, try this:

DELETE FROM my_table
WHERE EXISTS (
  SELECT *
  FROM my_table t2
  WHERE my_table.colour = t2.colour
  AND my_table.test = t2.test
  AND my_table.time < t2.time
)

This statement uses three noteworthy constructs:

  • Subquery in DELETE
  • Self-join
  • Emulation of a MIN(...), via self-join

The subquery with EXISTS is mentioned above.

The self-join is required whenever one row of a table must be compared against other rows of the same table. Finding the minimum value of some column is exactly that.

Normally, you'd use the MIN(...) function to find the minimum. The minimum can be defined as the single value for which no lower value exists, and that's what we're using here because we're not actually interested in the actual value but only want to identify the record which contains that value.

(Since we're deleting, our SELECT yields all the non-minimum rows, which we want to delete to keep only the minimums.)

So, what the statement says is:

Delete all records from my_table for which there is at least one record in my_table with the same colour and the same test but a lower time.

Upvotes: 5

Related Questions