Reputation: 265
I need help optimizing this code, it takes 12 seconds and I need it to take around 4seconds.
let publication idx (lst: string [] list) = // returns a specific value of the string [] list
lst
|>Array.map (fun arr -> arr.[idx])
let rec Tuple (x:int) (ID:string list) (Information:string [] list) =
if x < ID.Length then
let muro = [|(ID.[x], Information|> List.filter (fun elem -> elem.[1] = humanID.[x] )|> publication 0 |> List.toArray )|]
let rest = Tuple (x+1) ID Information
Array.append muro rest
else [||]
let FinalTuple= Tuple 0 ID Information
finalTuple is a : (string*string []) []
the recursion is taking to much time to finish and I cant seem to make it faster (ID.Length is 1600)
thanks for your help
Upvotes: 1
Views: 132
Reputation: 15391
I agree with @mydogisbox's general point - going back and forth from array to list is not going to help much performance wise.
That being said, the first problem I had was, I had to dig a bit to just figure out what the code was doing - so I took the liberty to make a first rewrite, just to see if I understood what was going on:
let extract1 (ids:string[]) (info:string[][]) =
[|
for id in ids ->
(id, [| for record in info do if record.[1] = id then yield record.[0] |])
|]
My sense, reading your code, was this: given an array of "records" - arrays containing something of interest (probably a publication?) in field 0 and the author in field 1, the goal was for a given set of author ids, to extract their publications. Or something like that.
Now this is not very pretty. Also, I don't know if this is any good, so let's do a benchmark - a dataset of 1,000,000 records that look like what I think you have:
let ids = [| 1 .. 100 |] |> Array.map string
let rng = System.Random()
let dataset =
[| for i in 0 .. 1000000 ->
[| System.Guid.NewGuid() |> string; rng.Next(0,100) |> string |] |]
Running this in FSI gives me:
> extract1 ids dataset |> ignore;;
Real: 00:00:01.486, CPU: 00:00:01.484, GC gen0: 3, gen1: 3, gen2: 0
val it : unit = ()
Can we make it prettier, or more functional-ish? Let's try:
let extract2 (ids:string[]) (info:string[][]) =
ids
|> Seq.map (fun id ->
id,
info
|> Seq.filter (fun record -> record.[1] = id)
|> Seq.map (fun record -> record.[0])
|> Seq.toArray)
|> Seq.toArray
The verdict?
> extract2 ids dataset |> ignore;;
Real: 00:00:01.588, CPU: 00:00:01.593, GC gen0: 3, gen1: 3, gen2: 0
val it : unit = ()
Prettier, but not better. Maybe the issue is that we are doing multiple passes, one per id. Sounds like maybe we should group?
let extract3 (ids:string[]) (info:string[][]) =
let IDs = ids |> Set.ofArray
info
|> Seq.groupBy (fun row -> row.[1])
|> Seq.filter (fun (id,rows) -> IDs |> Set.contains id)
|> Seq.map (fun (id,rows) -> id, rows |> Seq.toArray)
|> Seq.toArray
> extract3 ids dataset |> ignore;;
Real: 00:00:00.387, CPU: 00:00:00.390, GC gen0: 8, gen1: 8, gen2: 0
val it : unit = ()
Now we are talking. I am sure we could squeeze more - feel free to go at it. The main point, though, is that the code is also simpler (IMO), and conveys more clearly what the intent is. Hope this helps!
Upvotes: 3
Reputation: 19897
When trying to improve the performance of a piece of code, the first thing to do is eliminate unneeded memory allocations. A few places you can improve this:
Mapping over an array is faster than mapping over a list, but not faster than converting to an array, mapping and then converting back to a list.
Avoid using Array.append
. This results in doing a new memory allocation each time. Instead, you might consider a ResizeArray
or a list, where appending is much less expensive.
Try to make your function tail-recursive. In your case, this means you need to pass the intermediary result into the next level of recursing. In your case (if I understand it correctly), this would mean passing muro
down the call chain, and then adding the result of that call to muro
before passing it to the next recursion.
The tail-recursive function would look something like this:
let rec Tuple (x:int) (ID:string list) (Information:string [] list) (result:*correct type here*) =
if x < ID.Length then
let muro = [|(ID.[x], Information|> List.filter (fun elem -> elem.[1] = humanID.[x] )|> publication 0 |> List.toArray )|]
Tuple (x+1) ID Information Array.append muro rest
else result
let FinalTuple= Tuple 0 ID Information [||]
This probably changes the order of the resulting array, so depending on what you need you may need to rework it a bit.
Upvotes: 3