Reputation: 29
I am new to Haskell and would like some direction to solving my problem. I wanted to have a text encode function that list in which each word of the text is represented by its index. For e.g. :
["The more I like, the more I love.","The more I love, the more I hate."]
the output might be
(["The", "more", "I", "like", "the", "love.", "love,", "hate."],
[1, 2, 3, 4, 5, 2, 3, 6, 1, 2, 3, 7, 1, 2, 3, 8])
I have done the remove duplication part
removeDuplicates :: Eq a => [a] -> [a]
removeDuplicates = rdHelper []
where rdHelper seen [] = seen
rdHelper seen (x:xs)
| x `elem` seen = rdHelper seen xs
| otherwise = rdHelper (seen ++ [x]) xs
Upvotes: 2
Views: 171
Reputation: 26201
I guess the Data.Map
and Data.Set
packages are ideal tools to solve this job efficiently. My implementation would be as follows;
import qualified Data.Map.Lazy as Map
import qualified Data.Set as Set
encode :: [String] -> ([String],[[Int]])
encode wss = let dict = Map.fromList . zip (Set.toList . Set.unions . map (Set.fromList . words) $ wss) $ [1..]
in (map fst $ Map.toList dict, map (map (flip (Map.findWithDefault 0) dict) . words) wss)
*Main> encode ["Are you allright", "Hey there how are you", "Hello there", "Do you like coffee"]
(["Are","Do","Hello","Hey","allright","are","coffee","how","like","there","you"],[[1,11,5],[4,10,8,6,11],[3,10],[2,11,9,7]])
Upvotes: 1
Reputation: 15045
You can just iterate over the list of words and accumulate the unique words and its indexes. If the element is in the accumulated list, append the index to the accumulated list of indexes. If the element isn't in the list, append the new index (length of the list of words + 1).
To be honest, Haskell
code is more understandable, than my description:
import Data.List (findIndex)
build :: ([String], [Int]) -> String -> ([String], [Int])
build (words, indexes) word =
let
maybeIndex = findIndex (== word) words
in
case maybeIndex of
Just index ->
(words, indexes ++ [index + 1])
Nothing ->
(words ++ [word], indexes ++ [(+1) . length $ words])
buildIndexes =
let
listOfWords = words "The more I like, the more I love. The more I love, the more I hate."
in
foldl build ([], []) listOfWords
Here I have a concatenated string as an input
"The more I like, the more I love. The more I love, the more I hate."
Feel free to tailor the code for your needs.
By the way, it might be more performant to insert the elements at the beginning of the lists and then reverse the resulting lists.
import Data.List (findIndex)
build :: ([String], [Int]) -> String -> ([String], [Int])
build (words, indexes) word =
let
maybeIndex = findIndex (== word) words
in
case maybeIndex of
Just index ->
(words, (index + 1) : indexes)
Nothing ->
(word : words, ((+1) . length $ words) : indexes)
buildIndexes =
let
listOfWords = words "The more I like, the more I love. The more I love, the more I hate."
(listOfUniqueWords, listOfIndexes) = foldl build ([], []) listOfWords
in
(reverse listOfUniqueWords, reverse listOfIndexes)
Upvotes: 1