M.Y. Babt
M.Y. Babt

Reputation: 2891

Subtle type error

I am new to programming and F# is my first .NET language.

I am attempting this problem on Rosalind.info. Basically, given a DNA string, I am supposed to return four integers counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in the string.

Here is the code I have written so far:

open System.IO
open System

type DNANucleobases = {A: int; C: int; G: int; T: int}

let initialLetterCount = {A = 0; C = 0; G = 0; T = 0}

let countEachNucleobase (accumulator: DNANucleobases)(dnaString: string) =
    let dnaCharArray = dnaString.ToCharArray()
    dnaCharArray
    |> Array.map (fun eachLetter -> match eachLetter with
                                    | 'A' -> {accumulator with A = accumulator.A + 1}
                                    | 'C' -> {accumulator with C = accumulator.C + 1}
                                    | 'G' -> {accumulator with G = accumulator.G + 1}
                                    | 'T' -> {accumulator with T = accumulator.T + 1}
                                    | _ -> accumulator)

let readDataset (filePath: string) =
    let datasetArray = File.ReadAllLines filePath 
    String.Join("", datasetArray)

let dataset = readDataset @"C:\Users\Unnamed\Desktop\Documents\Throwaway Documents\rosalind_dna.txt"
Seq.fold countEachNucleobase initialLetterCount dataset

However, I received the following error message:

CountingDNANucleotides.fsx(23,10): error FS0001: Type mismatch. Expecting a DNANucleobases -> string -> DNANucleobases but given a DNANucleobases -> string -> DNANucleobases [] The type 'DNANucleobases' does not match the type 'DNANucleobases []'

What went wrong? What changes should I make to correct my mistake?

Upvotes: 1

Views: 141

Answers (1)

Vandroiy
Vandroiy

Reputation: 6223

countEachNucleobase returns an array of the accumulator type instead of just the accumulator it got as its first parameter. Therefore, Seq.fold can't find a valid solution for its 'State parameter: it's just the record on the input, but an array on the output. The function used for folding must have the accumulator type as both its first input and its output.

In place of Array.map in the question's code, you could already use Array.fold:

let countEachNucleobase (accumulator: DNANucleobases) (dnaString: string) =
    let dnaCharArray = dnaString.ToCharArray()
    dnaCharArray
    |> Array.fold (fun (accumulator : DNANucleobases) eachLetter ->
        match eachLetter with
        | 'A' -> {accumulator with A = accumulator.A + 1}
        | 'C' -> {accumulator with C = accumulator.C + 1}
        | 'G' -> {accumulator with G = accumulator.G + 1}
        | 'T' -> {accumulator with T = accumulator.T + 1}
        | _ -> accumulator) accumulator

And then, the call in the last line becomes:

countEachNucleobase initialLetterCount dataset

Shorter version

let readChar accumulator = function
    | 'A' -> {accumulator with A = accumulator.A + 1}
    | 'C' -> {accumulator with C = accumulator.C + 1}
    | 'G' -> {accumulator with G = accumulator.G + 1}
    | 'T' -> {accumulator with T = accumulator.T + 1}
    | _ -> accumulator

let countEachNucleobase acc input = Seq.fold readChar acc input

Since strings are char sequences, input will take strings as well as char arrays or other char sequences.

Upvotes: 3

Related Questions