Reputation: 1807
I am implementing some algorithm on haskell. This algorithm requires generating some data.
I have a function of an algorithm which takes generation function as a parameter. For example, algorithm is just multiplying input data by n:
algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf
dgf
is used to generate data. How to write function header correctly, as dgf
can be any function with any number of parameters?
Another variant is accepting not the generation function but already generated data.
algo :: a -> [b] -> [a]
algo n d = (\x -> n*x) d
So, now let's imagine I'm generation data with stdGen
, which uses IO. How can I make function more generic, so that it could accept both IO instance and plain values like just [1,2,3]
. This also relates to variant with function, as it can also produce IO.
All in all, which solution is better - having a generation function or a pre-generated data?
Thanks in advance.
Upvotes: 1
Views: 166
Reputation: 89043
For contrast, I'm going to take the opposite approach as dfeuer's answer.
Just use lists.
Consider your first example:
algo :: a -> ??? -> [a]
algo n dgf = map (\x -> x * n) $ dgf
You ask "How to write function header correctly, as dgf can be any function with any number of parameters?"
Well, one way is to use uncurrying.
Normally, Haskell functions are curried. If we have a function like
add :: Int -> Int -> Int
add x y = x + y
And we want a function that adds two to its input we can just use add 2
.
>>> map (add 2) [1..10]
[3,4,5,6,7,8,9,10,11,12]
Because add
is not actually a function that takes two arguments,
it's a function of one argument that returns a function of one argument.
We could have added parentheses to the argument of add above to make this more clear:
add :: Int -> (Int -> Int)
In Haskell, all functions are functions of one argument.
However, we can also go the other way - uncurry
a function
that returns a function to get a function that takes a pair:
>>> :t uncurry
uncurry :: (a -> b -> c) -> (a, b) -> c
>>> :t uncurry add
uncurry add :: (Int, Int) -> Int
This can also be useful, say if we want to find the sum of each pair in a list:
>>> map (uncurry add) [ (1,2), (3,4), (5,6), (7,8), (9,10) ]
[3,7,11,15,19]
In general, we can uncurry any function of type a0-> a1 -> ... -> aN -> b
into a function (a0, a1, ..., aN) -> b
, though there might not be
a cute library function to do it for us.
With that in mind, we could implement algo
by passing it an uncurried
function and a tuple of values:
algo :: Num a => a -> (t -> [a]) -> t -> [a]
algo n f t = map (\x -> x * n) $ f t
And then use anonymous functions to uncurry our argument functions:
>>> algo 2 (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo 3 (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
Now we could do it this way, but we don't need to. As implemented above,
algo
is only using f
and t
once. So why not pass it the list directly?
algo' :: Num a => a -> [a] -> [a]
algo' n ns = map (\x -> x * n) ns
It calculates the same results:
>>> algo' 2 $ (\(lo,hi) -> enumFromTo lo hi) (5, 10)
[10,12,14,16,18,20]
>>> algo' 2 $ enumFromTo 5 10
[10,12,14,16,18,20]
>>> algo' 3 $ (\(a,b,c,d) -> zipWith (+) [a..b] [c..d]) (1, 5, 10, 14)
[33,39,45,51,57]
>>> algo' 3 $ zipWith (+) [1..5] [10..14]
[33,39,45,51,57]
Furthermore, since haskell is non-strict, the argument to algo'
isn't evaluated
until it's actually used, so we don't have to worry about "wasting" time computing
arguments that won't actually be used:
algo'' :: Num a => a -> [a] -> [a]
algo'' n ns = [n,n,n,n]
algo''
doesn't use the list passed to it, so it's never forced, so whatever
computation is used to calculate it never runs:
>>> let isPrime n = n > 2 && null [ i | i <- [2..n-1], n `rem` i == 0 ]
>>> :set +s
>>> isPrime 10000019
True
(6.18 secs, 2,000,067,648 bytes)
>>> algo'' 5 (filter isPrime [1..999999999999999])
[5,5,5,5]
(0.01 secs, 68,936 bytes)
Now to the second part of your question - what if your data is being generated within some monad?
Rather than convince algo
to operate on monadic values, you could take the stream
based approach as dfeuer explains. Or you could just use a list.
Just because you're in a monad, doesn't mean that your values suddenly become strict.
For example, want a infinite list of random numbers? No problem.
newRandoms :: Num a -> IO [a]
newRandoms = unfoldr (\g -> Just (random g)) <$> newStdGen
Now I can just pass those to some algorithm:
>>> rints <- newRandoms :: IO [Int]
(0.00 secs, 60,624 bytes)
>>> algo'' 5 rints
[5,5,5,5]
(0.00 secs, 68,920 bytes)
For a small program which is just reading input from a file or two, there's no problem
with just using readFile
and lazy I/O to get a list to operate on.
For example
>>> let grep pat lines = [ line | line <- lines, pat `isInfixOf` line ]
>>> :set +s
>>> dict <- lines <$> readFile "/usr/share/dict/words"
(0.01 secs, 81,504 bytes)
>>> grep "poop" dict
["apoop","epoophoron","nincompoop","nincompoopery","nincompoophood","nincompoopish","poop","pooped","poophyte","poophytic","whisterpoop"]
(0.72 secs, 423,650,152 bytes)
Upvotes: 2
Reputation: 48580
One option is to take a stream rather than a list. If generating the values involves performing IO
, and there may be many many values, this is often the best approach. There are several packages that offer streams of some sort, but I'll use the streaming
package in this example.
import qualified Streaming.Prelude as S
import Streaming
algo :: Monad m => a -> Stream (Of a) m r -> Stream (Of a) m r
algo a = S.map (a +)
You can read Stream (Of a) m r
as "a way to use operations in m
to produce successive values of type a
and finally a result of type r
". This algo
function doesn't commit to any particular way of generating the data; they can be created purely:
algo a (S.each [these, are, my, elements])
or within IO
,
algo a $ S.takeWhile (> 3) (S.readLn :: Stream (Of Int) IO ())
or using a randomness monad, or whatever you like.
Upvotes: 6