Dmitriy Grankin
Dmitriy Grankin

Reputation: 608

sqlalchemy: get ids where values does not exist

Having a table with columns listing_ids, keys how to get listing_ids (and corresponding missing keys) where values from list_of_values do not exist in keys?

list_of_values = [key2,key3]

table

listing_id  keys
424         key1
424         key2
424         key3
523         key1
2433        key2
2433        key1
53          key2
3           key3

I need to get the following a result:

listing_id  keys_that_does_not_exist
523         key2
523         key3
2433        key3
53          key3
3           key2

I tried:

ids_without_keys_q = session.query(Table)\
                             .filter(~ exists().where(Table.key.in_(list_of_values))

I am using postgresql

Upvotes: 3

Views: 922

Answers (1)

Ruben Helsloot
Ruben Helsloot

Reputation: 13129

The question is how many return values you expect. PostgreSQL is not very good with generated data, like this solution uses, so if it's too slow, it might be faster to just fetch a list of all combinations and use Python to find the non-existing ones.

Also, I assume here based on your query that there is one row per listing_id/key pair, and that you're not storing keys as a string array. If that is the case, let me know and I'll revise the answer.


  1. First, I assume that you don't want IDs where there are no matches, so instead of generating a list, you can construct it like so. I use the func.count() to filter out all listings that do have matches to all keys:
unique_incomplete_listings = session.query(Table.listing_id.label('listing_id'))
  .group_by(Table.listing_id)
  .having(func.count() < 3)
  1. Second, convert this into a CTE, then get all possible combinations of (listing, key) from the query:
from sqlalchemy.dialects.postgresql import array

unique_incomplete_listings = unique_incomplete_listings.cte()
all_potential_matches = session.query(
    unique_incomplete_listings.c.listing_id,
    # this creates a cartesian product of listing_ids to elements in the array
    func.unnest(array(['key1', 'key2', 'key3']))
)
  1. Remove any matches you do find in the database using EXCEPT
query = all_potential_matches.except_all(
  session.query(Table.listing_id, Table.key)
  # We join here to prevent doing extra work if the listing does have all keys.
  # If this makes the query slower, consider making the CTE a subquery and
  # removing this join instead
  .join(unique_incomplete_listings,
        unique_incomplete_listings.c.listing_id == Table.listing_id)
)

Upvotes: 3

Related Questions