Ma0
Ma0

Reputation: 15204

Complex filter in sqlite

I have a database that looks like that:

No.     |Item   |Fx         |Fy         |Fxy
-------------------------------------------------
1       |a      |1.865      |-2.650     |-1.982
2       |a      |0.816      |-2.742     |-1.287
3       |a      |1.779      |-3.366     |-2.188
4       |b      |-0.367     |-0.229     |2.243
5       |b      |-0.413     |0.736      |1.889
6       |c      |-1.150     |0.150      |1.637
7       |c      |-1.817     |-1.003     |3.429

What I want is to apply a comparative filter which has to do the following (in pseudo-code):

for each distinct item "k":
    go through all rows with "k" and:
      remove row "i" if row "j" exists with: Fx_j < Fx_i AND Fy_j < Fy_i AND abs(Fxy_j) > abs(Fxy_i)

The application of the above filter for the sample given above would yield the following:

No.     |Item   |Fx         |Fy         |Fxy
-----------------------------------------------
1       |a      |1.865      |-2.650     |-1.982  <-- leaves due to 3
2       |a      |0.816      |-2.742     |-1.287  <--survives
3       |a      |1.779      |-3.366     |-2.188  <--survives
4       |b      |-0.367     |-0.229     |2.243   <--survives
5       |b      |-0.413     |0.736      |1.889   <--survives
6       |c      |-1.150     |0.150      |1.637   <-- leaves due to 7
7       |c      |-1.817     |-1.003     |3.429   <--survives

My filter is implemented in Python atm. I am querying the database (SELECT * FROM table GROUP BY Item) and I then filter the results.

The problem with my approach is that it takes a really long time since the database is a couple of GBs big. I was wondering whether a better way exists comprising optimally solely of a probably complex sqlite statement.

Any help is appreciated.

Upvotes: 0

Views: 152

Answers (1)

Tomalak
Tomalak

Reputation: 338208

The follwing SQL query seems to do what you want.

SELECT t.No, t.Item, t.Fx, t.Fy, t.Fxy FROM data_table t
EXCEPT
SELECT a.No, a.Item, a.Fx, a.Fy, a.Fxy FROM
  data_table a
  INNER JOIN data_table b ON b.Item = a.Item AND a.No <> b.No
    AND b.Fx < a.Fx
    AND b.Fy < a.Fy
    AND ABS(b.Fxy) > ABS(a.Fxy)

Docs: SQLite Compound Select Statements

There might be implications in how the data is indexed. Experiment with different clustered indexes. Intuitively I would expect clustering by (Item, No) to yield the best results, but whether that's feasible or not for your use case depends on how often rows are added or removed in the table.

Upvotes: 2

Related Questions