SQL "Group By" and Having in Haskell?

Question

I am trying to understand the semantics of SQL GROUP BY and HAVING clauses by mimicking it in Haskell with a minimal mock-up.

For example, the following SQL taken from the Postgres tutorial

SELECT city, max(temp_lo)
    FROM weather
    GROUP BY city
    HAVING max(temp_lo) < 40;

return the cities that have all lowest temperature values below 40.

My understanding is that HAVING depends on GROUP BY, so HAVING can be implemented on top of GROUP BY. To keep things simple, I use ordinal numbers to represent columns/fields. So here is what I have:

import Data.List
import Data.Function

at = flip (!!)

groupBySql n  = groupBy ((==) `on` (at n)). sortBy (comparing (at n) )

havingGroupBy f n tab = [ filter f | g <- groupBySql 2 tab ]

And the test data:

--mimick weather table: temp_lo, temp_hi, city
weather = [ [10, 30, 1], 
            [45, 99, 2],
            [0, 3, 3],
            [10, 35, 1],
            [55, 103, 2],
            [5, 29, 3]
          ]

test1 = havingGroupBy ( (<40).(at 0) ) 2 weather

But I have an error in writing havingGroupBy, and run out of time/energy.

*Main> test1

:53:1:
    No instance for (Show ([[b0]] -> [[b0]]))
      (maybe you haven't applied enough arguments to a function?)

A bunch of questions here:

Obviously, I didn't get the types right while mimicking the HAVING, how to fix it.
Is there a way to add type notation just for a parameter like f in havingGroupBy f n tab = [ filter f | g <- groupBySql 2 tab ], for debugging? My intended type for f is a boolean function on a row: [a]->Bool.
Is my understanding of GROUP BY and HAVING correct so far?

Zeta · Accepted Answer

^{Note: This post is written in literate Haskell. You can save it as Main.lhs and try it in your GHCi.}

To model this in Haskell, we should use the appropriate types:

> import Data.Function (on)
> import Data.List (maximumBy, sortBy, groupBy)
> import Data.Ord (comparing)
>
> type WEntry = (Int, Int, String)

> temp_lo (t, _, _) = t
> city    (_, _, c) = c

Now, a select in SQL can be thought as a function from one (or many) table(s) to a table of results:

> type Result = (Int, String)

> select :: [WEntry] -> [Result]
> select = map (\(l,h,c) -> (l, c))

Note that this does not account aggregation. Now, we group the entries by their cities:

> groupByCity :: [WEntry] -> [[WEntry]]
> groupByCity  = groupBy ((==) `on` city) . sortBy (comparing city)

Since both HAVING and the original SELECT use max(temp_lo), we aggregate the maximum lowest temperature here for some optimization:

> aggregate :: [[WEntry]] -> [WEntry]
> aggregate = map (maximumBy (comparing temp_lo))

You can think of groupByCity and GROUP BY as creating temporary sub tables. If we wouldn't use aggregates those tables would be concatenated in the end.

Now we need to filter those with a valid temperature:

> having :: [WEntry] -> [WEntry]
> having     = filter ((<40) . temp_lo)

And then put all things together:

> query :: [WEntry] -> [Result]
> query = select . having . aggregate . groupByCity

An alternative would have been

havingCity :: [[WEntry]] -> [[WEntry]]
havingCity = filter (not . any ((40<=) . temp_lo))
query = select . aggregate . havingCity . groupByCity

The important part is that an aggregate usually folds intermediate results. If you don't use grouping, max(temp_lo) will return a single value (the maximum lowest temperature of the whole table). If you group, max(temp_lo) will be maximum lowest temperature of a specific group. Likewise HAVING makes it possible to check the result an aggregate would return.

Last but not least, here's a main so that you can actually try it:

> weather :: [WEntry]
> weather =
>   [ (10,  30, "Berlin")
>   , (45,  99, "San Francisco")
>   , ( 0,   3, "Springfield")
>   , (14,  35, "Berlin")
>   , (55, 103, "San Francisco")
>   , ( 5,  29, "Springfield")
>   ]
>
> main = print $ query weather
> -- Result: [(14,"Berlin"),(5,"Springfield")]

SQL "Group By" and Having in Haskell?

Answers (1)

Related Questions

SQL &quot;Group By&quot; and Having in Haskell?

Answers (1)

Related Questions

SQL "Group By" and Having in Haskell?