Reputation: 15959
I am trying to solve a training exercise I found at codeingame.com
The question is the following: You have a list of numbers and want to find minimal value for the differences between v_small - v_big
, with the condition that v_big > v_small
and v_big
is before v_small
in the list. In addition the maximal time for this question is 1 second, and maximal memory usage is 512MB.
For small lists a naive algorithm sufficient:
---------------------------------- try1.hs -------------------------------------
import Control.Applicative ((<$>))
main :: IO ()
main = do _ <- getLine
v <- g . f . map read . take 1000 . words <$> getLine --or equivalently
-- v <- g . h . map read . take 1000 . words <$> getLine
print v
f :: [Int] -> [Int]
f [] = []
f xx@(x:xs) = (minimum $ map (\y -> y-x) xx) : (f xs)
g :: [Int] -> Int
g [] = 0
g xs = minimum xs
h :: [Int] -> [Int]
h [] = []
h (x:xs) = (foldr (\y' y -> min (y'-x) y) 0 xs): (h xs)
But I think both functions f
and h
generate n*(n+1)/2
many elements, where n
is the length of the list. Which takes ages for the last list has 99999 many elements.
the next try was finding local maxima and minima and compare only maxima with minima - which should reduce the cost of the algorithm to #maxima*#minima
---------------------------------- try2.hs -------------------------------------
import Control.Applicative ((<$>))
-- import Control.Arrow ((&&&))
data Extremum = Max Int | Min Int deriving (Show)
main :: IO ()
main = do _ <- getLine
e <- getExtremes
print e
getExtremes :: IO Int
getExtremes = minimum . concat . myMap f . headextr .
map read . take 1000 .words <$> getLine
myMap :: (a -> [a] -> [b]) -> [a] -> [[b]]
myMap _ [] = []
myMap g xx@(x:xs) = (g x) xx : myMap g xs
f :: Extremum -> [Extremum] -> [Int]
f (Max y) (Max _:xs) = f (Max y) xs
f (Max y) (Min x:xs) = (min 0 (x-y)): f (Max y) xs
f _ _ = []
headextr :: [Int] -> [Extremum]
headextr xx@(x:y:_) | x > y = Max x : extremes xx
| x < y = Min x : extremes xx
headextr xx = extremes xx
extremes :: [Int] -> [Extremum]
extremes [] = []
extremes [x] = [Max x, Min x]
extremes [x,y] | x > y = Min y:[]
| otherwise = Max y:[]
extremes (x:y:z:xs) | x > y && y < z = Min y:extremes (y:z:xs)
| x < y && y > z = Max y:extremes (y:z:xs)
| otherwise = extremes (y:z:xs)
But still the desired time of 1 second is not met.
I reduced the input with take 1000
for profiling, but as I am not a professional programmer I was not able to make any use of it, the only information I found out - which quite obvious - in the first versions f/h
are the costly functions and in the second version f
too is the culprit.
the input file for this exercise can be found at http://www.codingame.com/ide/fileservlet?id=372552140039 - (if this link is not working it can be found at www.codingame.com -> training -> Stock exchange losses -> Test_5_input.txt/Test_5_output.txt)
So how to speed up this algorithm or is there another algorithm that is faster?
Upvotes: 1
Views: 553
Reputation: 24166
Here's a solution using a Monoid
, BiggestDrop
that keeps track of the biggest drop across numbers. It remembers a third piece of information, the minimum in a range of numbers. This would allow us to split the dataset into pieces, process those pieces, and then combine the pieces to get the answer. The example code below doesn't take advantage of this; it merely folds the Monoid
's mappend
across the dataset once.
There's probably a better way to write the Monoid
s that would be faster.
I tried using the 'pipes' library, since it seemed appropriate for this problem, but I don't think it added anything to the solution.
{-# LANGUAGE ScopedTypeVariables #-}
module Main (
main
) where
import System.IO
import Data.Maybe
import Data.Monoid
import Data.Char
import Control.Applicative
import Control.Monad
import Pipes
import qualified Pipes.Prelude as P
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
-- Min monoid
newtype Min a = Min {
getMin :: Maybe a
} deriving (Show, Eq)
instance (Ord a) => Monoid (Min a) where
mempty = Min Nothing
mappend x y = Min $ ((liftM2 min) (getMin x) (getMin y)) <|> (getMin x) <|> (getMin y)
toMin = Min . Just
-- Max monoid
newtype Max a = Max {
getMax :: Maybe a
} deriving (Show, Eq)
instance (Ord a) => Monoid (Max a) where
mempty = Max Nothing
mappend x y = Max $ ((liftM2 max) (getMax x) (getMax y)) <|> (getMax x) <|> (getMax y)
toMax = Max . Just
-- Extrema monoid
type Extrema a = (Min a, Max a)
getMinimum = getMin . fst
getMaximum = getMax . snd
toExtrema x = (toMin x, toMax x)
-- Biggest drop monoid
data BiggestDrop a = BiggestDrop {
extrema :: Extrema a,
biggestDrop :: Max a
} deriving Show
instance (Num a, Ord a) => Monoid (BiggestDrop a) where
mempty = BiggestDrop {
extrema = mempty,
biggestDrop = mempty
}
mappend before after = BiggestDrop {
extrema = mappend (extrema before) (extrema after),
biggestDrop = mconcat [
biggestDrop before,
biggestDrop after,
Max $ (liftM2 (-)) (getMaximum $ extrema before) (getMinimum $ extrema after)
]
}
toBiggestDrop x = BiggestDrop {
extrema = toExtrema x,
biggestDrop = mempty
}
-- Read data from stdin and fold BiggestDrop's mappend across it
main = do
(answer :: BiggestDrop Int) <- P.fold mappend mempty id (words >-> (P.map (toBiggestDrop . read)))
print answer
print . fromJust . getMax $ biggestDrop answer
where
words = stdinWords >-> (P.map C.unpack) >-> (P.filter ((/=) []))
-- Produce words from stdin
stdinWords' :: (MonadIO m) => Int -> Producer B.ByteString m ()
stdinWords' chunkSize = goMore B.empty
where
goMore remainder = do
eof <- liftIO isEOF
case eof of
True ->
unless (B.null remainder) $ yield remainder
_ -> do
chunk <- liftIO $ B.hGet stdin chunkSize
let (first:others) = C.splitWith isSpace chunk
goParts ((B.append remainder first):others)
goParts parts = do
case parts of
[] ->
goMore B.empty
[x] ->
goMore x
(x:xs) -> do
unless (B.null x) $ yield x
goParts xs
stdinWords = stdinWords' 512
I put the above code together using the 'pipes' library, hoping to learn about 'pipes-bytestring'. I had to give up and write a producer to read the words from the file. The size of chunks to read from the file is just a guess.
Upvotes: 1
Reputation: 153172
Your first two solutions are slow because for each element in the list, they do a computation that visits all successive elements. This is O(n^2). I haven't entirely grokked your second solution, but it seems to be something like this: filter out only local extrema (where local extremum means it's bigger than its two neighbors or smaller than its two neighbors), then run an O(n^2) algorithm on the list of extrema. Unfortunately, in the worst case every element can be an extremum, so this is also O(n^2) overall. (In fact, in a random list, we expect most elements to be local extrema, so this is not just being pessimistic about things.)
Let's see if we can invent an O(n) algorithm instead.
We'll start with a slightly rephrased O(n^2) algorithm. The idea of this algorithm is this: First, nondeterministically choose a place in the list to act as v_big
. Then, nondeterministically choose a later place in the list to act as v_small
. Take the maximal value among all these nondeterministic choices. In code:
f_spec xs = maximum $ do
later@(v_big:_) <- tails xs
v_small:_ <- tails later
return (v_big - v_small)
Now, we need two separate insights to turn this into an O(n) algorithm. The first is that we only need to split once: once we have chosen v_small
, we know that the right way to choose v_big
is to pick the biggest element before it in the list. We can implement that algorithm this way:
f_slow xs = maximum $ do
earlier@(v_small:_) <- tails (reverse xs)
let v_big = maximum earlier
return (v_big - v_small)
This is "almost" O(n): it only makes one nondeterministic choice; but doing the necessary computation once we've made that choice is still O(n), resulting in a total runtime of O(n^2). The second insight is that we can memoize the computation needed after our nondeterministic choice, so that this computation is O(1). We can build a list of all the maximums very efficiently like this:
maximums xs = scanl1 max xs
Like maximum
, this function takes O(n) time; unlike maximum
, this one returns the maximums of all prefixes of xs
instead of just the maximum of the entire list. So now, when we do our nondeterministic choice, we can choose both v_small
and v_big
at the same time:
f_fast xs = maximum $ do
(v_big, v_small) <- zip (maximums xs) xs
return (v_big - v_small)
From there you only need a bit of prettification to get something that looks very pretty indeed and still runs in O(n) time:
f xs = maximum $ zipWith (-) (maximums xs) xs
Upvotes: 9