Reputation: 726
How do I convert a whole Row in a sequence from a decimal to a float, delete missing values or nan values, and then turn those same values back into decimals all in the same function.
Any suggestions?
By row I mean the row you select when you create a type from a CSV Provider.
type IncomeCsv = CsvProvider<IncomeCsvFile>
IncomeCsv.GetSample().Rows
|> Seq.filter (fun row -> row.State = "TX")
|> List.ofSeq
For one observation of TX, I am getting these values:
[(TX, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan)]
This is an example of one observation from many in a SEQUENCE. I need to filter this specific observation out by using the function described above.
I have tried using Double.Is.NaN, but for some reason it is not working.
|> Seq.filter (fun element -> not (Double.IsNaN element))
I am getting this error:
Practice2.fsx(39,53): error FS0001: This expression was expected to have
type
'float'
but here has type
'CsvProvider<...>.Row'
Upvotes: 0
Views: 79
Reputation: 6324
You should take a look at both the Csv type provider and the Csv file parser documentation. For example you can directly apply Filter
and Map
on the Csv provided type, to transform your data. In that case you would operate on the type directly (not its Row
, e.g. on CsvFile.GetSample()). Also, the csv file parses is better suited for malformed data. There might are option to specify the schema and type directly, as well deal with missing values.
You can of course filter out nan
and cast float to decimal in the usual way as well (this operates on CsvProvider.Row):
data
|> Seq.filter (fun x -> not (Double.IsNaN(x.Income)))
|> Seq.map (fun x -> (x.Id, x.State, decimal x.Income))
//val it : seq<int * string * decimal> =seq [(40, "TX", 2000.1M); (15, "TX", 3000M)]
The data I used:
Id,State,Income
40,TX,2000.1
48,MO,#N/A
15,TX,3000
78,TN,
41,VT,
Upvotes: 1