user1872391
user1872391

Reputation: 61

List processing in Haskell

I am teaching myself Haskell and have run into a problem and need help.

Background:

type AInfo  =  (Char, Int)
type AList  =  [AInfo]       (let’s say [(‘a’, 2), (‘b’,5), (‘a’, 1), (‘w’, 21)]

type BInfo  =  Char
type BList  =  [BInfo]      (let’s say [‘a’, ‘a’, ‘c’, ‘g’, ‘a’, ‘w’, ‘b’]

One quick edit: The above information is for illustrative purposes only. The actual elements of the lists are a bit more complex. Also, the lists are not static; they are dynamic (hence the uses of the IO monad) and I need to keep/pass/"return"/have access to and change the lists during the running of the program.

I am looking to do the following:

For all elements of AList check against all elements of BList and where the character of the AList element (pair) is equal to the character in the Blist add one to the Int value of the AList element (pair) and remove the character from BList.

So what this means is after the first element of AList is checked against all elements of BList the values of the lists should be:

AList [(‘a’, 5), (‘b’,5), (‘a’, 1), (‘w’, 21)]

BList [‘c’, ‘g’, ‘w’, ‘b’]

And in the end, the lists values should be:

AList [(‘a’, 5), (‘b’,6), (‘a’, 1), (‘w’, 22)]

BList [‘c’, ‘g’]

Of course, all of this is happening in an IO monad.

Things I have tried:

  1. Using mapM and a recursive helper function. I have looked at both:

    Every element of AList checked against every element of bList -- mapM (myHelpF1 alist) blist and Every element of BList checked against every element of AList – mapM (myHelpF2 alist) blist

  2. Passing both lists to a function and using a complicated if/then/else & helper function calls (feels like I am forcing Haskell to be iterative; Messy convoluted code, Does not feel right.)

  3. I have thought about using filter, the character value of AList element and Blist to create a third list of Bool and the count the number of True values. Update the Int value. Then use filter on BList to remove the BList elements that …… (again Does not feel right, not very Haskell-like.)

Things I think I know about the problem:

The solution may be exceeding trivial. So much so, the more experienced Haskellers will be muttering under their breath “what a noob” as they type their response.

Any pointers would be greatly appreciated. (mutter away….)

Upvotes: 3

Views: 1290

Answers (4)

Will Ness
Will Ness

Reputation: 71070

The operation you describe is pure, as @luqui points out, so we just define it as a pure Haskell function. It can be used inside a monad (including IO) by means of fmap (or do).

import Data.List

combine alist blist = (reverse a, b4) where

First we sort and count the B list:

  b = map (\g->(head g,length g)) . group . sort $ blist

We need the import for group and sort to be available. Next, we roll along the alist and do our thing:

  (a,b2) = foldl g ([],b) alist
  g (acc,b) e@(x,c) = case pick x b of 
                        Nothing -> (e:acc,b)
                        Just (n,b2) -> ((x,c+n):acc,b2)
  b3 = map fst b2
  b4 = [ c | c <- blist, elem c b3 ]

Now pick, as used, must be

  pick x [] = Nothing
  pick x ((y,n):t) 
     | x==y = Just (n,t)
     | otherwise = case pick x t of Nothing -> Nothing
                                    Just (k,r) -> Just (k, (y,n):r)

Of course pick performs a linear search, so if performance (speed) becomes a problem, b should be changed to allow for binary search (tree etc, like Map). The calculation of b4 which is filter (`elem` b3) blist is another potential performance problem with its repeated checks for presence in b3. Again, checking for presence in trees is faster than in lists, in general.

Test run:

> combine [('a', 2), ('b',5), ('a', 1), ('w', 21)] "aacgawb"

([('a',5),('b',6),('a',1),('w',22)],"cg")

edit: you probably want it the other way around, rolling along the blist while updating the alist and producing (or not) the elements of blist in the result (b4 in my code). That way the algorithm will operate in a more local manner on long input streams (that assuming your blist is long, though you didn't say that). As written above, it will have a space problem, consuming the input stream blist several times over. I'll keep it as is as an illustration, a food for thought.

So if you decide to go the 2nd route, first convert your alist into a Map (beware the duplicates!). Then, scanning (with scanl) over blist, make use of updateLookupWithKey to update the counts map and at the same time decide for each member of blist, one by one, whether to output it or not. The type of the accumulator will thus have to be (Map a Int, Maybe a), with a your element type (blist :: [a]):

scanl :: (acc -> a -> acc) -> acc -> [a] -> [acc]

scanning = tail $ scanl g (Nothing, fromList $ reverse alist) blist
g (_,cmap) a = case updateLookupWithKey (\_ c->Just(c+1)) a cmap of
                 (Just _, m2) -> (Nothing, m2)   -- seen before
                 _            -> (Just a, cmap)  -- not present in counts 
new_b_list = [ a | (Just a,_) <- scanning ]
last_counts = snd $ last scanning

You will have to combine the toList last_counts with the original alist if you have to preserve the old duplicates there (why would you?).

Upvotes: 0

luqui
luqui

Reputation: 60463

A few pointers:

Don't use [(Char, Int)] for "AList". The data structure you are looking for is a finite map: Map Char Int. Particularly look at member and insertWith. toList and fromList convert from the representation you currently have for AList, so even if you are stuck with that representation, you can convert to a Map for this algorithm and convert back at the end. (This will be more efficient than staying in a list because you are doing so many lookups, and the finite map API is easier to work with than lists)

I'd approach the problem as two phases: (1) partition out the elements of blist by whether they are in the map, (2) insertWith the elements which are already in the map. Then you can return the resulting map and the other partition.

I would also get rid of the meaningless assumptions such as that keys are Char -- you can just say they are any type k (for "key") that satisfies the necessary constraints (that you can put it in a Map, which requires that it is Orderable). You do this with lowercase type variables:

import qualified Data.Map as Map

sieveList :: (Ord k) => Map.Map k Int -> [k] -> (Map.Map k Int, [k])

Writing algorithms in greater generality helps catch bugs, because it makes sure that you don't use any assumptions you don't need.

Oh, also this program has no business being in the IO monad. This is pure code.

Upvotes: 2

The Internet
The Internet

Reputation: 8103

While I am by no means a Haskell expert, I have a partial attempt that returns that result of an operation once. Maybe you can find out how to map it over the rest to get your solution. The addwhile is clever, since you only want to update the first occurrence of an element in lista, if it exists twice, it will just add 0 to it. Code critiques are more than welcome.

import Data.List
type AInfo = (Char, Int)
type AList = [AInfo]

type BInfo = Char
type BList = [BInfo]

lista = ([('a', 2), ('b',5), ('a', 1), ('w', 21)] :: AList)
listb = ['a','a','c','g','a','w','b']

--step one, get the head, and its occurrences
items list = (eleA, eleB) where
        eleA = length $ filter (\x -> x == (head list)) list
        eleB = head list

getRidOfIt list ele = (dropWhile (\x -> x == ele) list) --drop like its hot

--add to lista
addWhile :: [(Char, Int)] -> Char -> Int -> [(Char,Int)]    
addWhile [] _ _ = []
addWhile ((x,y):xs) letter times = if x == letter then (x,y+times) : addWhile xs letter times 
                                   else (x,y) : addWhile xs letter 0

--first answer
firstAnswer = addWhile lista (snd $ items listb) (fst $ items listb)
--[('a',5),('b',5),('a',1),('w',21)]

Upvotes: 0

Father Deus
Father Deus

Reputation: 1

import Data.List

type AInfo  =  (Char, Int)
type AList  =  [AInfo]

type BInfo  =  Char
type BList  =  [BInfo]

process :: AList -> BList -> AList
process [] _ = []
process (a:as) b = if is_in a b then (fst a,snd a + 1):(process as (delete (fst a) b)) else a:process as b where
        is_in f [] = False
        is_in f (s:ss) = if fst f == s then True else is_in f ss

*Main> process [('a',5),('b',5),('a',1),('b',21)] ['c','b','g','w','b']
[('a',5),('b',6),('a',1),('b',22)]
*Main> process [('a',5),('b',5),('a',1),('w',21)] ['c','g','w','b']
[('a',5),('b',6),('a',1),('w',22)]

Probably an important disclaimer: I'm rusty at Haskell to the point of ineptness, but as a relaxing midnight exercise I wrote this thing. It should do what you want, although it doesn't return a BList. With a bit of modification, you can get it to return an (AList,BList) tuple, but methinks you'd be better off using an imperative language if that kind of manipulation is required.

Alternately, there's an elegant solution and I'm too ignorant of Haskell to know it.

Upvotes: 0

Related Questions