I'll Eat My Hat
I'll Eat My Hat

Reputation: 1352

Profiling F# for performance of recursive functions

I decided to use F# to solve the second problem for the first day of Advent of Code 2018 (performing a cyclic summation and finding the first repeated sum) in an expressive manner, but the performance is lacking and I can't find the cause of the slowdown.

Problem solved as in Python 3

For a given input with ~140,000 summations, this code executes in a few seconds.

data = list(map(int, '''
+1
-1
'''.strip().splitlines()))
from itertools import cycle, accumulate
class superset(set):
    def add(self, other):
        super().add(other)
        return other

def mapwhile(func, pred, iterable):
    for i in iterable:
        if not pred(i):
            yield func(i)
            return
        yield func(i)

def last(iterable):
    return list(iterable)[-1]

s = superset([0])
print(last(mapwhile(s.add, lambda x: x not in s, accumulate(cycle(data)))))

Problem solved as in F#

I added a conditional breakpoint on the match expression to time every thousandth i, it seems this code performs ~100 sums/sec and would not come to a solution even after an hour. A dramatic slowdown by a ridiculous order in magnitude.

let input = @"
+1
-1
"
let cycle xs = seq { while true do yield! xs }
let accumusum xs = Seq.scan(fun acc elem -> acc + elem) 0 xs

let rec findfreqcycle i (s:int Set) (data:int seq) = 
    let head, tail = Seq.head data, Seq.tail data
    match s.Contains(head) with
    | false -> findfreqcycle (i+1) (s.Add(head)) (tail)
    | true ->  head


let data = input.Trim().Split('\n') |> Seq.map int |> Seq.toList |> cycle
accumusum data |> findfreqcycle 0 Set.empty

As far as I can tell, the core ideas behind each code sample are more or less the same. The input is eagerly parsed only once, with a generator function/sequence to lazily repeat each number.

The only difference is that the function to actually find the first repeating summation is recursive in the F# example. Memory profiling indicates almost constant memory usage, and tail recursion is active.

What could I be doing wrong, and how can I better profile these recursive and generative functions for performance?

Upvotes: 4

Views: 286

Answers (3)

OP already has an accepted answer but I thought I propose a few variants.

The task ask for a running aggregate (the Set) over the input values while still allowing an early exit when the Set has the state that we can't add a number to it because we already seen it.

Normally we fold to aggregate a state but fold doesn't allow us to exit early. That's why the suggestion was to use scan which is a streaming fold + pick that allows early exit.

An alternative is to code up a fold that allows shortcutting once a the state has been reached: val foldAndCheck: (a' -> 'b -> CheckResult<'a, 'c>) -> 'a -> 'b seq -> 'c option. fold is like a for loop that is aggregating all values, foldAndCheck is like a for loop that aggregates values up to a point and then returns a result.

It could then look something like:

type [<Struct>] CheckResult<'T, 'U> =
  | Continue of c:'T
  | Done     of d:'U

// val foldAndCheck: (a' -> 'b -> CheckResult<'a, 'c>) -> 'a -> 'b seq -> 'c option
let foldAndCheck f z (s : _ seq) =
  let f = OptimizedClosures.FSharpFunc<_, _, _>.Adapt f
  use e = s.GetEnumerator ()
  let rec loop s =
    if e.MoveNext () then
      match f.Invoke (s, e.Current) with
      | Continue ss -> loop ss
      | Done     rr -> Some rr 
    else
      None
  loop z

let cycle xs = seq { while true do yield! xs }

let run (input : string) =
  let folder s v = if Set.contains v s then Done v else Continue (Set.add v s)
  input.Trim().Split('\n') 
  |> Seq.map int 
  |> cycle
  |> Seq.scan (+) 0
  |> foldAndCheck folder Set.empty

When running it on my machine I get numbers like this:

Result: Some 448
Took  : 280 ms
CC    : (31, 2, 1)

(CC is garbage collection in gen 0, 1 and 2)

Then I created a F# program which I think is equivalent to the Python program as it uses a mutable set and a mapWhile function:

let addAndReturn (set : HashSet<_>) =
  fun v ->
    set.Add v |> ignore
    v

let mapWhile func pred (s : _ seq) =
  seq {
    // F# for v in s ->
    //  doesn't support short-cutting. So therefore the use while
    use e = s.GetEnumerator ()
    let mutable cont = true
    while cont && e.MoveNext () do
      let v = e.Current
      if not (pred v) then
        cont <- false
        yield func v
      else
        yield func v
  }

let cycle xs = seq { while true do yield! xs }

let accumulate xs = Seq.scan (+) 0 xs

let last xs = Seq.last xs

let run (input : string) =
  let data = input.Trim().Split('\n') |> Seq.map int 
  let s = HashSet<int> ()

  data
  |> cycle
  |> accumulate
  |> mapWhile (addAndReturn s) (fun x -> s.Contains x |> not)
  |> last

The performance numbers:

Result: 448
Took  : 50 ms
CC    : (1, 1, 1)

If we say we allow mutation + seq a solution could look like this:

let cycle xs = seq { while true do yield! xs }

let run (input : string) =
  let s = HashSet<int> ()

  input.Trim().Split('\n')
  |> Seq.map int 
  |> cycle
  |> Seq.scan (+) 0
  |> Seq.find (fun v -> s.Add v |> not)

Which runs like this:

Result: 448
Took  : 40 ms
CC    : (1, 1, 1)

There are other cools tricks that could be applied to increase the search performance even more but it will not be worth the effort as the majority of the cost is in parsing the integers at this point.

Upvotes: 3

I&#39;ll Eat My Hat
I&#39;ll Eat My Hat

Reputation: 1352

I decided to give implementing using Seq.scan and Seq.pick a try according to Tomas' answer, and got this result. He was right, it's not great. On the upside, it executes in ~0.3 seconds.

let cycle xs = seq { while true do yield! xs }    
let accumusum xs = Seq.scan(fun acc elem -> acc + elem) 0 xs

let tryfind (sum, s:int Set) =
    match s.Contains(sum) with
    | true -> Some(sum)
    | false -> None

let scanstate (sum, s:int Set) el =
    el, s.Add(sum)

let findfreqcycle (data:int seq) =
    let seen = Seq.scan scanstate (Seq.head data, Set.empty) (Seq.tail data)
    Seq.pick tryfind seen

let data = cycle <| (input.Trim().Split('\n') |> Seq.map int |> Seq.toList)
accumusum data |> findfreqcycle

Upvotes: 5

Tomas Petricek
Tomas Petricek

Reputation: 243096

As mentioned in the comments, Seq.tail is horribly inefficient, especially if you use it in a loop in the way you do. The reason is that it creates a new sequence that iterates over the original sequence and skips the first element (so, after 1000 iterations, you have to go over 1000 sequences, each skipping one element).

The pattern with head and tail works much better if you use a list, because functional lists have been designed for this kind of processing. In your case, you could do something like this (which follows the same pattern as your original function):

let rec findfreqcycle sum (s:int Set) input data = 
    match data with 
    | x::xs when s.Contains (sum + x) -> (sum + x)
    | x::xs -> findfreqcycle (sum + x) (s.Add (sum + x)) input xs
    | [] ->  findfreqcycle sum s input input

let data = input.Trim().Split('\n') |> Seq.map int |> Seq.toList 
findfreqcycle 0 Set.empty data data

I changed it so that it uses pattern matching (on lists). I also changed the code so that it takes a finite list and, when it gets to the end, it starts again. As a consequence, it also sums the numbers on the fly (rather than using Seq.scan - that would not work here, because I'm not using infinite lists).

On the input from Pastebin, I get the result 448 in about 0.17 seconds.

Upvotes: 5

Related Questions