Reputation: 181
I'm struggling to get my ahead around using the csv type provider in F# for simple data analysis tasks. I have done some googling around the 'Seq' function and the csv type provider as a whole but cant find resources relevant to my issue, so help is appreciated.
I'm attempting to use F# to create metrics on Horse Racing data (per each runner within a race). My data is in a csv and has a structure similar to this: raceId, runnerId, name, finishingPosition, startingPrice, etc
So what i want to do initially is group each csv row by raceId and create extra 'insights' on each race (An example here would be 'positionInBetting' using 'startingPrice' for each runner within the race).
this is what i have:
open FSharp.Data
type Runner = CsvProvider<Sample="runners.csv",AssumeMissingValues=true>
let dataset = Runner.Load("runners.csv")
let racesSince2010 = dataset.Rows |> Seq.filter (fun r -> r.Meeting_date.IsSome && r.Meeting_date.Value > new System.DateTime(2010,1,1)) |> Seq.groupBy (fun r -> r.Race_id)
So this achieves the first part of grouping runners by races and gives me seq of tuples where the key is the raceId and the value is a seq of Runners (I assume, but VS tells me it is actually a seq<CsvProvider<...>.Row>
)
then i expected this to work:
let raceDetails (raceId, runnersList:seq<Runner>) = runnersList |> Seq.iter ( fun r -> printfn "race: %i runner: %s" raceId r.)
but r.name isn't available in VS intellisense. I know i'm failing to understand why the output of my grouping function is defined as seq<CsvProvider<...>.Row>
instead of seq<Runner>
, but i cant find anything to explain it to me, or how to attack the problem i am having.
Alex
Upvotes: 2
Views: 128
Reputation: 12184
type Runner = CsvProvider<Sample="runners.csv",AssumeMissingValues=true>
This statement defines a type that represents an entire .csv file, not just a single row of the csv. Nested types are created within the primary type which represent internal data structures within the file (such as the row structure in the case of csv files).
Hence, Runner
does not have a name
associated with it but Runner.Row
should.
This distinction perhaps isn't so obvious for csv files at the first glance but internal structures become far more prominently important if you are dealing with, e.g. XML.
This should work:
let raceDetails (raceId, runnersList:seq<Runner.Row>) =
runnersList
|> Seq.iter ( fun r -> printfn "race: %i runner: %s" raceId r.name)
Upvotes: 2