epsilonhalbe
epsilonhalbe

Reputation: 15959

find the maximal difference between all succeeding elements

I am trying to solve a training exercise I found at codeingame.com

The question is the following: You have a list of numbers and want to find minimal value for the differences between v_small - v_big , with the condition that v_big > v_small and v_big is before v_small in the list. In addition the maximal time for this question is 1 second, and maximal memory usage is 512MB.

For small lists a naive algorithm sufficient:

---------------------------------- try1.hs -------------------------------------
import Control.Applicative ((<$>))

main :: IO ()
main = do _ <- getLine
          v <- g . f . map read . take 1000 . words <$> getLine --or equivalently
--        v <- g . h . map read . take 1000 . words <$> getLine 
          print v

f :: [Int] -> [Int]
f [] =   []
f xx@(x:xs) = (minimum $ map (\y -> y-x) xx) : (f xs)

g :: [Int] -> Int
g [] = 0
g xs = minimum xs

h :: [Int] -> [Int]
h [] = []
h (x:xs) = (foldr (\y' y -> min (y'-x) y) 0 xs): (h xs)

But I think both functions f and h generate n*(n+1)/2 many elements, where n is the length of the list. Which takes ages for the last list has 99999 many elements.

the next try was finding local maxima and minima and compare only maxima with minima - which should reduce the cost of the algorithm to #maxima*#minima

---------------------------------- try2.hs -------------------------------------
import Control.Applicative ((<$>))
-- import Control.Arrow ((&&&))

data Extremum = Max Int | Min Int deriving (Show)


main :: IO ()
main = do _ <- getLine
          e <- getExtremes
          print e

getExtremes :: IO Int
getExtremes = minimum . concat . myMap f . headextr .
                                         map read . take 1000 .words <$> getLine

myMap :: (a -> [a] -> [b]) -> [a] -> [[b]]
myMap _ [] = []
myMap g xx@(x:xs) = (g x) xx : myMap g xs

f :: Extremum -> [Extremum] -> [Int]
f (Max y) (Max _:xs) = f (Max y) xs
f (Max y) (Min x:xs) = (min 0 (x-y)): f (Max y) xs
f _ _ = []

headextr :: [Int] -> [Extremum]
headextr xx@(x:y:_) | x > y = Max x : extremes xx
                    | x < y = Min x : extremes xx
headextr xx = extremes xx


extremes :: [Int] -> [Extremum]
extremes [] = []
extremes [x] = [Max x, Min x]
extremes [x,y]      | x > y          =       Min y:[]
                    | otherwise      =       Max y:[]
extremes (x:y:z:xs) | x > y && y < z = Min y:extremes (y:z:xs)
                    | x < y && y > z = Max y:extremes (y:z:xs)
                    | otherwise      =       extremes (y:z:xs)

But still the desired time of 1 second is not met. I reduced the input with take 1000 for profiling, but as I am not a professional programmer I was not able to make any use of it, the only information I found out - which quite obvious - in the first versions f/h are the costly functions and in the second version f too is the culprit.

the input file for this exercise can be found at http://www.codingame.com/ide/fileservlet?id=372552140039 - (if this link is not working it can be found at www.codingame.com -> training -> Stock exchange losses -> Test_5_input.txt/Test_5_output.txt)

So how to speed up this algorithm or is there another algorithm that is faster?

Upvotes: 1

Views: 553

Answers (2)

Cirdec
Cirdec

Reputation: 24166

Here's a solution using a Monoid, BiggestDrop that keeps track of the biggest drop across numbers. It remembers a third piece of information, the minimum in a range of numbers. This would allow us to split the dataset into pieces, process those pieces, and then combine the pieces to get the answer. The example code below doesn't take advantage of this; it merely folds the Monoid's mappend across the dataset once.

There's probably a better way to write the Monoids that would be faster.

I tried using the 'pipes' library, since it seemed appropriate for this problem, but I don't think it added anything to the solution.

{-# LANGUAGE ScopedTypeVariables #-}
module Main (
    main
) where


import System.IO

import Data.Maybe
import Data.Monoid
import Data.Char
import Control.Applicative
import Control.Monad

import Pipes
import qualified Pipes.Prelude as P

import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C

-- Min monoid
newtype Min a = Min {
    getMin :: Maybe a
} deriving (Show, Eq)

instance (Ord a) => Monoid (Min a) where
    mempty = Min Nothing
    mappend x y = Min $ ((liftM2 min) (getMin x) (getMin y)) <|>  (getMin x) <|> (getMin y)

toMin = Min . Just

-- Max monoid
newtype Max a = Max {
    getMax :: Maybe a
} deriving (Show, Eq)

instance (Ord a) => Monoid (Max a) where
    mempty = Max Nothing
    mappend x y = Max $  ((liftM2 max) (getMax x) (getMax y)) <|>  (getMax x) <|> (getMax y)

toMax = Max . Just       

-- Extrema monoid
type Extrema a = (Min a, Max a)

getMinimum = getMin . fst
getMaximum = getMax . snd

toExtrema x = (toMin x, toMax x)

-- Biggest drop monoid

data BiggestDrop a = BiggestDrop {
    extrema :: Extrema a,
    biggestDrop :: Max a
} deriving Show

instance (Num a, Ord a) => Monoid (BiggestDrop a) where
    mempty = BiggestDrop {
        extrema = mempty,
        biggestDrop = mempty
    }
    mappend before after = BiggestDrop {
        extrema = mappend (extrema before) (extrema after),
        biggestDrop = mconcat [
            biggestDrop before,
            biggestDrop after,
            Max $ (liftM2 (-)) (getMaximum $ extrema before) (getMinimum $ extrema after)
        ]
    }

toBiggestDrop x = BiggestDrop {
        extrema = toExtrema x,
        biggestDrop = mempty
    }

-- Read data from stdin and fold BiggestDrop's mappend across it

main = do
  (answer :: BiggestDrop Int) <- P.fold mappend mempty id (words >-> (P.map (toBiggestDrop . read)))
  print answer
  print . fromJust . getMax $ biggestDrop answer
  where
    words = stdinWords >-> (P.map C.unpack) >-> (P.filter ((/=) []))


-- Produce words from stdin

stdinWords' :: (MonadIO m) => Int -> Producer B.ByteString m ()
stdinWords' chunkSize = goMore B.empty
    where
        goMore remainder = do
            eof <- liftIO isEOF
            case eof of
                True ->
                    unless (B.null remainder) $ yield remainder                    
                _ -> do
                    chunk <- liftIO $ B.hGet stdin chunkSize
                    let (first:others) = C.splitWith isSpace chunk
                    goParts ((B.append remainder first):others)
        goParts parts = do
            case parts of
                [] ->
                    goMore B.empty
                [x] ->
                    goMore x
                (x:xs) -> do
                    unless (B.null x) $ yield x
                    goParts xs

stdinWords = stdinWords' 512 

I put the above code together using the 'pipes' library, hoping to learn about 'pipes-bytestring'. I had to give up and write a producer to read the words from the file. The size of chunks to read from the file is just a guess.

Upvotes: 1

Daniel Wagner
Daniel Wagner

Reputation: 153172

Your first two solutions are slow because for each element in the list, they do a computation that visits all successive elements. This is O(n^2). I haven't entirely grokked your second solution, but it seems to be something like this: filter out only local extrema (where local extremum means it's bigger than its two neighbors or smaller than its two neighbors), then run an O(n^2) algorithm on the list of extrema. Unfortunately, in the worst case every element can be an extremum, so this is also O(n^2) overall. (In fact, in a random list, we expect most elements to be local extrema, so this is not just being pessimistic about things.)

Let's see if we can invent an O(n) algorithm instead.

We'll start with a slightly rephrased O(n^2) algorithm. The idea of this algorithm is this: First, nondeterministically choose a place in the list to act as v_big. Then, nondeterministically choose a later place in the list to act as v_small. Take the maximal value among all these nondeterministic choices. In code:

f_spec xs = maximum $ do
    later@(v_big:_) <- tails xs
    v_small:_       <- tails later
    return (v_big - v_small)

Now, we need two separate insights to turn this into an O(n) algorithm. The first is that we only need to split once: once we have chosen v_small, we know that the right way to choose v_big is to pick the biggest element before it in the list. We can implement that algorithm this way:

f_slow xs = maximum $ do
    earlier@(v_small:_) <- tails (reverse xs)
    let v_big = maximum earlier
    return (v_big - v_small)

This is "almost" O(n): it only makes one nondeterministic choice; but doing the necessary computation once we've made that choice is still O(n), resulting in a total runtime of O(n^2). The second insight is that we can memoize the computation needed after our nondeterministic choice, so that this computation is O(1). We can build a list of all the maximums very efficiently like this:

maximums xs = scanl1 max xs

Like maximum, this function takes O(n) time; unlike maximum, this one returns the maximums of all prefixes of xs instead of just the maximum of the entire list. So now, when we do our nondeterministic choice, we can choose both v_small and v_big at the same time:

f_fast xs = maximum $ do
    (v_big, v_small) <- zip (maximums xs) xs
    return (v_big - v_small)

From there you only need a bit of prettification to get something that looks very pretty indeed and still runs in O(n) time:

f xs = maximum $ zipWith (-) (maximums xs) xs

Upvotes: 9

Related Questions