Pete
Pete

Reputation: 367

Find rows with 'not like each other' column 1 when column 2 matches

I have one table with 2 columns in database fls2 which is Sqlite;

name        |    sha256
------------|------------------
ab/ac/ad    |    12345
ab/ad/af    |    12345
zx/ad/af    |    12345

I would like to find names where 'name like 'ab%' and 'name not like 'ab%' are both true for a particular sha256. So in the above case the 3 rows share a sha256 of '12345', and I would like to consider that a sub dataset. Within that dataset, if both 'name like 'ab%' and 'name not like 'ab%' are true (for 2 or more different rows obviously) I would like all the rows returned.

What I am doing is searching for cases where an identical file (identified by its sha256) is present in 2 different top level directories.

I know how to do this in perl after fetching the data but ideally if I could do this in the DB it would be far better. Ive tried

select name 
from 
    fls2 
where 
    sha256 = (select sha256 from fls2 where name like 'ab%') 
and 
    name not like 'ab%';

But its not returning any rows (and I know there are at least a few because I have found them manually).

Upvotes: 0

Views: 60

Answers (3)

GMB
GMB

Reputation: 222492

Within that dataset, if both 'name like 'ab%' and 'name not like 'ab%' are true (for 2 or more different rows obviously) I would like all the rows returned.

You can use window functions:

select name, sha256
from (
    select 
        f.*,
        max(case when name like 'ab%' then 1 end) over(partition by sha256) max_ab,
        max(case when name not like 'ab%' then 1 end) over(partition by sha256) max_not_ab
    from fls2 f
) t
where max_ab = 1 and max_not_ab = 1

In the subquery, the window max()s check if a record exists with the same sha256 and a name that starts (resp. does not start) with 'ab%'. Then, the outer query filters on records that satisfy both conditions.

Upvotes: 0

forpas
forpas

Reputation: 164099

Use EXISTS:

select * from fls2 f
where
  exists (select 1 from fls2 where sha256 = f.sha256 and name like 'ab%')
  and
  exists (select 1 from fls2 where sha256 = f.sha256 and name not like 'ab%')

See the demo.
Or with sum() window function:

select f.name, f.sha256
from (
  select *, 
    sum(name like 'ab%') over (partition by sha256) sum1,
    sum(name not like 'ab%') over (partition by sha256) sum2
  from fls2
) f
where f.sum1 > 0 and f.sum2 > 0

See the demo.
Results:

| name     | sha256 |
| -------- | ------ |
| ab/ac/ad | 12345  |
| ab/ad/af | 12345  |
| zx/ad/af | 12345  |

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

Use aggregation and having:

select sha226, group_concat(name) as names
from t
group by sha226
having sum(case when name like 'ab%' then 1 else 0 end) > 0 and
       sum(case when name not like 'ab%' then 1 else 0 end) > 0;

This puts all the names in a list on the same row.

Upvotes: 1

Related Questions