Reputation: 75

understanding cyclic/recursive data types in haskell

I'm trying to fully understand haskell's data types, so I created these:

data District = District {nameOfCity :: String, 
                          subDistricts ::[DistrictUnit]} 
                          deriving (Show, Eq)

data DistrictUnit = DistrictUnit District 
                    | BranchUnit Branch
                    deriving (Show, Eq)

data Branch = Branch {nameOfBranch :: String,
                      numOfEmployees :: Int,
                      sales :: Int}
                      deriving (Show, Eq)

Here are some examples:

a = District {nameOfCity = "Berlin", subDistrcits = []}

b = District {nameOfCity = "New York", 
             subDistricts = 
             [DistrictUnit
             (District {nameOfCity = "Amsterdam",
             subDistricts = 
             [DistrictUnit
             (District {nameOfCity = "London",
             subDistricts = []})]})]}

c = District {nameOfCity = "Iowa" , 
             subDistricts = 
             [BranchUnit
             (Branch {nameOfBranch = "Omaha",            
             numOfEmployees = 3, 
             sales = 2343})]}

Now I'm trying to build two functions: getNameOfCity and getNameOfBranch:

getNameOfCity b -> ["New York", "Amsterdam", "London"]

But I have massive pattern matching issues. I just don't know how to build a function that works for all inputs.

Here are my attempts:

getNameOfCity :: District -> [String]
getNameOfCity (District a b) = [a]

This function will only give me the first name:

["New York"]

So I tried this:

getNameOfCity (District a [DistrictUnit (District b c)]) = [a, b]

And (of course) that one will only give me the first two names:

["New York", "Amsterdam"]

How can I make that getNameOfCity work for all inputs? And will getNamesOfBranch work the same way (since it has different parameters)? Thank you in advance :)

Upvotes: 1

Answers (2)

Jon Purdy

Reputation: 54971

You want getNameOfCity applied to a District to return its nameOfCity field, prepended to the list of city names from each of its subDistricts that are Districts and not Branches.

So you had a good starting point with something like this:

getNameOfCity :: District -> [String]
getNameOfCity (District name units) = [name]

But of course you need to do something with units. The pattern you wrote in your second example, [DistrictUnit (District b c)], only matches when the subDistricts field contains a list of exactly one element, where that element is also a DistrictUnit, and extracts only the nameOfCity field (b) from that district, ignoring any further nested subdistricts (in c).

To help you with this, I’m going to give you a template for you to fill in, following a common pattern that’s very helpful when you’re getting familiar with Haskell: working back from the desired result, breaking down the problem into small steps, and giving each step a name and a type signature. That way, if you make a mistake, the compiler will produce much more comprehensible error messages. Then, once you have a working solution, you can remove the intermediate variables and consolidate the steps into a more compact expression.

First, your desired result is the root district name prepended to the list of names of all subdistricts.

getNameOfCity :: District -> [String]
getNameOfCity (District name units) = let

  allCities :: [String]
  allCities = …

  in name : allCities

To construct this, you’ll need to iterate over the units and extract the names from each district, then concatenate all the results.

getNameOfCity :: District -> [String]
getNameOfCity (District name units) = let

  districtCities :: [[String]]
  districtCities = …  -- List of list of city names for each ‘District’

  allCities :: [String]
  allCities = …  -- Concatenated ‘districtCities’

  in name : allCities

The iteration over the units can be done in a few ways, but what I would do is extract only the Districts first so you have a narrower, more precise type to work with, and then you can simply apply getNameOfCity recursively on them.

getNameOfCity :: District -> [String]
getNameOfCity (District name units) = let

  -- Extract only those ‘units’ that are ‘District’s
  districts :: [District]
  districts = …
    -- Hint:
    --   Hoogle ‘(a -> Maybe b) -> [a] -> [b]’
    --   and define ‘getDistrict :: DistrictUnit -> Maybe District’;
    --   or use a list comprehension of the form: ‘[… | … <- units]’

  districtCities :: [[String]]
  districtCities = …

  allCities :: [String]
  allCities = …

  in name : allCities

You could follow a very similar approach for the branch names; it’s a good exercise to consider how you might abstract over the commonalities, once you have both of them in front of you.

It’s also a common higher-level approach to solve this sort of problem using the Semigroup & Monoid typeclasses, which are mathy names for the sets of types that can be “appended” associatively in some way (the <> operator), and have a default “empty” value (mempty) which is a no-op for appending. The general method is to map each value to a “summary” value—here, the list of city names—and then fold those summaries together, conveniently combined into a single operation called foldMap. For instance:

districtCities :: District -> [String]
districtCities (District city units) = [city] <> foldMap unitCities units

unitCities :: DistrictUnit -> [String]

-- Recursively summarise district.
unitCities (DistrictUnit district) = districtCities district

-- Return “empty” value for branch.
unitCities (BranchUnit branch) = mempty

This generalises to accumulating other fields, like the branch names:

districtBranches :: District -> [String]
districtBranches (District _ units) = foldMap unitBranches units

unitBranches :: DistrictUnit -> [String]
unitBranches (DistrictUnit district) = districtBranches district
unitBranches (BranchUnit branch) = [nameOfBranch branch]

Or other types of values, like the total sales, using the Sum monoid:

import Data.Monoid (Sum(..))

totalSales :: District -> Int
totalSales district = getSum (districtSales district)
-- Or: totalSales = getSum . districtSales

districtSales :: District -> Sum Int
districtSales (District _ units) = foldMap unitSales units

unitSales :: DistrictUnit -> Sum Int
unitSales (DistrictUnit district) = foldMap districtSales district
unitSales (BranchUnit branch) = Sum (branchSales branch)

Or even multiple values at once, using e.g.: (Sum 1, ["London"]) <> (Sum 2, ["Paris"]) = (Sum 3, ["London", "Paris"]). Notice how all these implementations have similar structure, too, though, and consider how you might abstract over this “folding” pattern generically.

Upvotes: 0

leftaroundabout

Reputation: 120711

Rather than trying to do everything with a single function, it's easier to write separate helpers:

allDistrictNames :: District -> [String]  -- This was your `getNameOfCity`
allDistrictNames_u :: DistrictUnit -> [String]
allDistrictNames_ul :: [DistrictUnit] -> [String]

allDistrictNames and allDistrictNames_u can now be implemented with simple pattern matches and calls to each other. allDistrictNames_ul needs to call allDistrictNames_u on all list elements, and combine the results. There's a standard function for this exact purpose.

Upvotes: 1

understanding cyclic/recursive data types in haskell

Answers (2)

Related Questions