Luke
Luke

Reputation: 107

F# Merge Sort - IndexOutOfRangeException when attempting to implement a match with structure

The overall code for my merge sort, looked something like this:

   let remove array = 
    Array.sub array 1 (array.Length - 1)

let rec merge (chunkA : int[]) (chunkB : int[]) =
    if chunkA.Length = 0 && chunkB.Length = 0 then [||]
    else if chunkA.Length = 0 || chunkB.[0] < chunkA.[0] then Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))
    else Array.append [| chunkA.[0] |] (merge (remove chunkA) chunkB)

let rec mergesort (array : int[]) =
    let middle = array.Length / 2
    let chunkA = match middle with
                    | 1 -> [| array.[0] |]
                    | _ -> mergesort  [| for i in 0 .. middle - 1 -> array.[i]|]
    let chunkB = match array.Length - middle with
                    | 1 -> [| array.[array.Length - 1] |]
                    | _ -> mergesort [| for i in middle .. array.Length - 1 -> array.[i]|]
    merge chunkA chunkB

This code worked perfectly well, but I wanted to change the series of if statements in the merge function to a match with statement.

I then tried implementing the following code:

let rec merge (chunkA : int[]) (chunkB : int[]) =
match chunkA.Length with
| 0 when chunkA.Length = chunkB.Length -> [||]
| 0 | _ when chunkB.[0] < chunkA.[0] -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))
| _ -> Array.append [| chunkA.[0] |] (merge (remove chunkA) chunkB)

When I ran my code, Visual Studio threw an "IndexOutOfRangeException" at me, specifically here:

| 0 when chunkA.Length = chunkB.Length -> [||]

In this instance, chunkA was empty, but chunkB had a single number in it. Thus, I am not entirely sure why F# even attempted to return this case, since the lengths of chunks A and B were not the same, but I'm also confused as to why this would throw an Index error, specifically on the empty array.

In addition, I am rather new to F# and functional programming in general. If the structure or methodology in my code is not up to par, then please feel free to comment on this as well.

Also, if I'm being thick, please feel free to tell me that as well.

Many Thanks, Luke

Upvotes: 3

Views: 126

Answers (1)

rmunn
rmunn

Reputation: 36688

As Fyodor Soikin pointed out, the source of your exception is this line:

| 0 | _ when chunkB.[0] < chunkA.[0] -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))

But it might not be obvious to you why that's throwing an exception. As I learned last week to my surprise, the when clause in a match expression applies to all the cases since the previous ->. In other words, when you write the line above, F# understands you to mean that you want the when clause applied to either the 0 case or the _ case. (Which is, of course, redundant).

And that's the cause of your exception: F# sees the 0 case but still applies the when chunkB.[0] < chunkA.[0] test — and since chunkA is empty, that's always going to throw an exception. To fix this, you'll have to separate these two cases, so that the when applies to only the case you mean for it to apply to:

| 0 -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))
| _ when chunkB.[0] < chunkA.[0] -> Array.append [| chunkB.[0] |] (merge chunkA (remove chunkB))

Unfortunately, this does mean some duplication of code. In this case it's not a big deal since the duplicated code is a one-liner, but if you have a significant chunk of code that ends up duplicated because of having to split out two cases like this (where the two cases shouldn't share a when condition), then you could turn that duplicated chunk into a function so that it's not duplicated any more.

Edit: I just noticed a section of your code that could be simpler. Your original code included:

let chunkA = match middle with
                | 1 -> [| array.[0] |]
                | _ -> mergesort  [| for i in 0 .. middle - 1 -> array.[i]|]
let chunkB = match array.Length - middle with
                | 1 -> [| array.[array.Length - 1] |]
                | _ -> mergesort [| for i in middle .. array.Length - 1 -> array.[i]|]

What you're doing here is taking a slice of the array, but F# has a very convenient syntax for slicing arrays: array.[start..end] where start and end are the inclusive indices of the slice you want. So the expression [| for i in 0 .. middle - 1 -> array.[i]|] can be simplified down to array.[0 .. middle - 1], and the expression [| for i in middle .. array.Length - 1 -> array.[i]|] can be simplified down to array.[middle .. array.Length - 1]. Let's replace those expressions in your code and see what we get:

let chunkA = match middle with
                | 1 -> [| array.[0] |]
                | _ -> mergesort  array.[0 .. middle - 1]
let chunkB = match array.Length - middle with
                | 1 -> [| array.[array.Length - 1] |]
                | _ -> mergesort array.[middle .. array.Length - 1]

Now, looking at this, I notice that the 1 condition in both cases is actually dealing with the exact same array slice as the _ condition; the only difference is that if middle is 1, you don't call mergesort. How do I know that it's the exact same array slice? Well, if middle is 1, then the array.[0 .. middle-1] expression would become array.[0..0], which is a slice of length 1 in the array starting at index 0, exactly equivelant to [| array.[0] |]. And if array.Length is exactly 1 more than middle, then the array.[middle .. array.Length - 1] expression is going to be array.[middle .. middle], which is exactly equivalent to [| array.[middle] |].

So if it weren't for the call to mergesort, we could consolidate these two expressions. And there's a pretty simple way to do so, in fact! Simply move the length check to the top of mergesort, like so:

let rec mergesort (array : int[]) =
    if array.Length < 2 then
        array  // Arrays of length 0 or 1 are already sorted
    else
        // rest of mergesort function goes here

And now you can consolidate the two cases of your match safely, knowing that you won't hit an infinite-recursion loop:

let middle = array.Length / 2
let chunkA = mergesort array.[0 .. middle - 1]
let chunkB = mergesort array.[middle .. array.Length - 1]
merge chunkA chunkB

Putting all this together, my suggested version of your original mergesort function looks like:

let rec mergesort (array : int[]) =
    if array.Length < 2 then
        array  // Arrays of length 0 or 1 are already sorted
    else
        let middle = array.Length / 2
        let chunkA = mergesort array.[0 .. middle - 1]
        let chunkB = mergesort array.[middle .. array.Length - 1]
        merge chunkA chunkB

As a bonus, this version of mergesort doesn't have the subtle bug your original version had: you forgot to consider the empty-array case. Calling your original mergesort on an empty array would produce an infinite loop. You'll probably benefit more from working it out for yourself than from me explaining how, so I'll just mention that in F# for i in 0 .. -1 is not an error, but will go through the for loop zero times (i.e., the for loop's body will not be executed). And likewise, array.[0..-1] is not an error, but will produce an empty array. Armed with the knowledge of that detail, you should be able to work through the code of your original mergesort function and see that it would infinitely loop if you passed it an empty string. (Although since your mergesort call in that infinite loop is in not tail position, it would not be a tail call. And so the stack would eventually overflow, saving you from a "true" infinite loop).

Upvotes: 3

Related Questions