elfeck
elfeck

Reputation: 385

SQL: List-Field contains sublist

Quick preface: I use the SQL implementation persistent (Haskell) and esqueleto.

Anyway, I want to have a SQL table with a column of type [String], i.e. a list of strings. Now I want to make a query which gives me all the records where a given list is a sublist of the one in the record.

For instance the table with

ID  Category
0   ["math", "algebra"]
1   ["personal", "life"]
2   ["algebra", "university", "personal"]

with a query of ["personal", "algebra"] would return only the record with ID=2, since ["personal", "algebra"] is a sublist of ["algebra", "university", "personal"].

Is a query like this possible with variable-length of my sought-after sublist and "basic" SQL operators?

If someone knows their way around persistent/esqueleto that would of course be awesome.

Thanks.

Upvotes: 3

Views: 913

Answers (2)

ruben.moor
ruben.moor

Reputation: 1965

Expanding on the comment of Gordon Linoff and the previous answer:

SQL databases are sometimes limited in their power. Since the order of your Strings in [String] does not seem to matter, you are trying to put something like a set into a relational database and for your query you suggest something like a is a subset of operator.

If there was a database engine that provides those structures, there would be nothing wrong about using it (I don't know any). However, approximating your set logic (or any logic that is not natively supported by the database) has disadvantages:

  • You have to explicitly deal with edge cases (cf. xnyhps' answer)
  • Instead of hiding the complexity of storing data, you need to explicitly deal with it in your code
  • You need to study the database engine rather than writing your Haskell code
  • The interface between database and Haskell code becomes blurry

A mightier approach is to reformulate your storing task to something that fits easily into the relational database concept. I.e. try to put it in terms of relations. Entities and relations are simple, thus you avoid edge cases. You don't need to bother how exactly the db backend stores your data. You don't have to bother with the database much at all. And your interface is reduced to rather straightforward queries (making use of joins). Everything that cannot be (comparatively) easily realized with a query, belongs (probably) into the Haskell code.

Of course, the details differ based on the specific circumstances.

In your specific case, you could use something like this:

Table: Category
ID Description
0  math
1  algebra
2  personal
3  life
4  university

Table: CategoryGroup
ID  CategoryID
0   0
0   1
1   2
1   3
2   1
2   4
2   2

... where the foreign key relation allows to have groups of categories. Here, you are using a relational database where it excels. In order to query for CategoryGroup you would join the two tables, resulting in a result of type

[(Entity CategoryGroup, Entity Category)]

which I would transform in Haskell to something like

[(Entity CategoryGroup, [Entity Category])]

Where the Category entities are collected for each CategoryGroup (that requires deriving (Eq, Ord) in your CategoryGroup-Model).

The set-logic as described above, for a given List cs :: [Entity Category], would then go like

import qualified Data.Set as Set
import Data.Set (isSubsetOf)
let s = Set.fromList ["personal", "algebra"]
let s0 = Set.fromList $ map (categoryDescription . entityVal) cs
if s `isSubsetOf` s0 -- ... ?

Getting used to the restrictions of relational databases can be annoying in the beginning. I guess, for something of central importance (persisting data) a robust concept is often better than a mighty one and it pays out to always know what your database is doing exactly.

Upvotes: 3

xnyhps
xnyhps

Reputation: 3316

By using [String], persistent converts the entire list to a quoted string, making it very hard to work with from SQL.

You can do something like:

mapM (\cat ->
         where_ (x ^. Category `like` (%) ++. val (show cat) ++. (%)))
     ["personal", "algebra"]

But this is very fragile (may break when the categories contain ", etc.).

Better approaches are:

  1. You could do the filtering in Haskell if the database is small enough.

  2. It would be much easier to model your data as:

    Objects:

    ID  ... 
    0   ...
    1   ...
    2   ...
    

    ObjectCategories:

    ObjectID  Category
    0   math
    0   algebra
    1   personal
    1   life
    2   algebra
    2   university
    2   personal
    

Upvotes: 1

Related Questions