wawawa
wawawa

Reputation: 21

Union of two trees in Haskell

I'm struggling to model this problem's domain in haskell.

I know how to solve this in procedural language (with loops, this is rather easy), but can't seem to figure out how I would do this in Haskell.

I want to take merge (union) of RoseDirTree.

data RoseDirTree
      = File Text Int
      | Folder Text [RoseDirTree]
      deriving (Show, Eq, Ord)

-- | Merges two dir trees
-- If file names are same, prefers the right hands side.
mergeDirTree :: RoseDirTree -> RoseDirTree -> RoseDirTree
mergeDirTree = undefined

Here is an example case:

archivesFromS3 :: RoseDirTree
archivesFromS3 = Folder "/"
      [ Folder "2011"
            [ File "jan.dump" 0
            , File "feb.dump" 0
            ]
      , Folder "2012"
            [ File "jan.dump" 0
            ]
      ]

archivesFromLocal :: RoseDirTree
archivesFromLocal = Folder "/"
      [ Folder "2011"
            [ File "jan.dump" 1
            , File "march.dump" 1
            ]
      , Folder "2019"
            [ File "jan.dump" 1
            ]
      ]

-- | archivesFromS3 `mergeDirTree` archivesFromLocal
expectedMerged :: RoseDirTree
expectedMerged = Folder "/"
      [ Folder "2011"
            [ File "jan.dump" 1
            , File "feb.dump" 0
            , File "march.dump" 1
            ]
      , Folder "2012"
            [ File "jan.dump" 0
            ]
      , Folder "2019"
            [ File "jan.dump" 1]
      ]

Looking at this from more of tree problem perspective, my initial thought are the structure has to be in form of:

mergeTree :: RoseDirTree -> RoseDirTree -> RoseDirTree
mergeTree (File lhsName idl) (File rhsName idr) = 
      if lhsName == rhsName 
      then File rhsName idr 
      else error "can't merge two different file"
mergeTree (Folder lhsName childl) (Folder rhsName childr) = 
      if lhsName == rhsName 
      then Folder rhsName (childl <> childr) 
      else error "can't merge two different folder"
mergeTree _ _ = error "cannot merge file and folder"

But in-evidently, this approach:

  1. Does not handle the case when Folder has nested folder with same name.

I'm really not sure where to go next. Any pointers appreciated.

Upvotes: 1

Views: 361

Answers (1)

Noughtmare
Noughtmare

Reputation: 10645

The part that is problematic is childl <> childr. Instead, define a new function:

mergeChildren :: [RoseDirTree] -> [RoseDirTree] -> [RoseDirTree]
mergeChildren = undefined

This function should call the mergeTree function at some point.

A thing to consider is whether you are allowed to assume that the input file and folder names are sorted or not. If they are sorted you'll be able to implement it in a slightly easier way. You can also consider sorting them yourself using Data.List.sortOn if you are allowed to use that function.

If the inputs are sorted then this function can be similar in structure to the merge of a mergesort, e.g. this one by Redu:

merge :: Ord a => [a] -> [a] -> [a]
merge [] ys         = ys
merge xs []         = xs
merge (x:xs) (y:ys) | x < y     = x:merge xs (y:ys)
                    | otherwise = y:merge (x:xs) ys

The difference is that you need a different comparison function and you need to do something different if the folder or files are equal.

Spoiler

Here's how I would implement the mergeChildren function assuming the inputs are sorted. This still requires you to implement a custom comparison function.

-- | Compares the names of the top level file or folder
-- Returns LT if the first input is a file and the second a folder
-- Returns GT if the first input is a folder and the second a file 
compareTree :: RoseDirTree -> RoseDirTree -> Ordering
compareTree = undefined

mergeChildren :: [RoseDirTree] -> [RoseDirTree] -> [RoseDirTree]
mergeChildren [] ys         = ys
mergeChildren xs []         = xs
mergeChildren (x:xs) (y:ys) = 
  case compareTree x y of
    LT -> x             : mergeChildren xs (y:ys)
    EQ -> mergeTree x y : mergeChildren xs ys
    GT -> y             : mergeChildren (x:xs) ys

Upvotes: 1

Related Questions