sgolay
sgolay

Reputation: 21

Filtering String from List Haskell

I'm trying to write a program that reads a text file, then displays the frequencies and count of words in the file. What I need to do next is filter certain words from the text file. I have been looking at online resources for a couple of hours and still can't find what I'm looking for!

I have provided my code for the program so far below:

lowercase = map toLower
top doc = wordPairs
    where
        listOfWords = words (lowercase doc)
        wordGroups  = group (sort listOfWords)
        wordPairs   = reverse
                    $ sort
                    $ map (\x -> (length x, head x))
                    $ filterWords
                    wordGroups

filterWords :: String -> String
filterWords = filter (all (`elem` ["poultry outwits ants"])) . words

Upvotes: 1

Views: 615

Answers (2)

Gabriel Ciubotaru
Gabriel Ciubotaru

Reputation: 1092

There is my code which solve your problem

top :: String -> [(Int,String)] --Signature is always important
top = sorter . wordFrequency . groups . filtered --just compose `where` functions
    where
        -- This will filter your words
        filtered = filter (`notElem` ["poultry","outwits","ants"]) . words . map toLower 
        -- Group your words
        groups = group . sort 
        -- Create the pairs of (count, word)
        wordFrequency = map (length &&& head)
        -- Sort your list by first. for reverse just switch a and b
        sorter = sortBy (\ a b -> fst b `compare` fst a)

Upvotes: 1

karakfa
karakfa

Reputation: 67567

It might be easier if you split the program in a different way. For example

import Data.List(group,sort)
import Control.Arrow((&&&))

freq :: Ord a => [a] -> [(Int,a)]
freq = reverse . sort . map (length &&& head) . group . sort

second part will be defining the input to this function. You want to filter only certain elements.

select :: Eq a => [a] -> [a] -> [a]
select list = filter (`elem` list)

these will make testing easier since you don't need the specific typed input.

Finally, you can tie it all together

freq $ select ["a","b","c"] $ words "a b a d e a b b b c d e c"

will give you

[(4,"b"),(3,"a"),(2,"c")]

Upvotes: 1

Related Questions