Reputation: 1169
I am trying to create a program that reads a text file and splits the text into a list and then creates a tuple containing each would with how many times it occurs in the text. I then need to be able to remove certain words from the list and print the final list.
I have tried different ways to try and filter Strings from a list of Strings in Haskell with no success. I have found that the filter
function is the best for what I want to do, but am not sure how to implement it.
The code that I have so far is that splits up text read from a file into a list of Strings:
toWords :: String -> [String]
toWords s = words s
I then added this to remove specific Strings from the list:
toWords :: String -> [String]
toWords s = words s
toWords s = filter (`elem` "an")
toWords s = filter (`elem` "the")
toWords s = filter (`elem` "for")
Which I know is wrong, but am unsure as to how to do it. Please can anyone help me with this.
Here is my full code so far:
main = do
contents <- readFile "testFile.txt"
let lowContents = map toLower contents
let outStr = countWords (lowContents)
let finalStr = sortOccurrences (outStr)
print outStr
-- Counts all the words.
countWords :: String -> [(String, Int)]
countWords fileContents = countOccurrences (toWords fileContents)
-- Splits words.
toWords :: String -> [String]
toWords s = words s
toWords s = filter (`elem` "an")
toWords s = filter (`elem` "the")
toWords s = filter (`elem` "for")
-- Counts, how often each string in the given list appears.
countOccurrences :: [String] -> [(String, Int)]
countOccurrences xs = map (\xs -> (head xs, length xs)) . group . sort $ xs
-- Sort list in order of occurrences.
sortOccurrences :: [(String, Int)] -> [(String, Int)]
sortOccurrences sort = sortBy comparing snd
Upvotes: 0
Views: 2981
Reputation: 116174
This will keep each word but for the forbidden ones:
toWords s = filter (\w -> w `notElem` ["an","the","for"]) (words s)
Equivalent variants:
-- explicit not
toWords s = filter (\w -> not (w `elem` ["an","the","for"])) (words s)
-- using and (&&) instead of elem
toWords s = filter (\w -> w/="an" && w/="the" && w/="for") (words s)
-- using where to define a custom predicate
toWords s = filter predicate (words s)
where predicate w = w/="an" && w/="the" && w/="for")
-- pointfree
toWords = filter (flip notElem ["an","the","for"]) . words
Upvotes: 3
Reputation: 1194
Filter is what is known in Haskell as a higher-order function. You should read about it, that kind of functions can be very useful.
Maybe what you are looking for is something like this:
toWords s = filter (condition) s
That "condition" is a function too, that function must contain the filter you want to apply.
A little example would be if you have a lists of numbers and you wanted to take just the numbers >10, it would end up being something like this:
filterNUmbers n = filter (>10) n
Upvotes: 0