SantiClaus
SantiClaus

Reputation: 726

How to convert a whole ROW in a SEQ from a decimal to a float, and then back again?

How do I convert a whole Row in a sequence from a decimal to a float, delete missing values or nan values, and then turn those same values back into decimals all in the same function.

Any suggestions?

By row I mean the row you select when you create a type from a CSV Provider.

type IncomeCsv = CsvProvider<IncomeCsvFile>
IncomeCsv.GetSample().Rows
|> Seq.filter (fun row -> row.State = "TX")
|> List.ofSeq

For one observation of TX, I am getting these values:

[(TX, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
nan, nan, nan, nan, nan, nan, nan, nan, nan, nan)]

This is an example of one observation from many in a SEQUENCE. I need to filter this specific observation out by using the function described above.

I have tried using Double.Is.NaN, but for some reason it is not working.

|> Seq.filter (fun element -> not (Double.IsNaN element))

I am getting this error:

  Practice2.fsx(39,53): error FS0001: This expression was expected to have 
     type
  'float'    
     but here has type
  'CsvProvider<...>.Row'

Upvotes: 0

Views: 79

Answers (1)

s952163
s952163

Reputation: 6324

You should take a look at both the Csv type provider and the Csv file parser documentation. For example you can directly apply Filter and Map on the Csv provided type, to transform your data. In that case you would operate on the type directly (not its Row, e.g. on CsvFile.GetSample()). Also, the csv file parses is better suited for malformed data. There might are option to specify the schema and type directly, as well deal with missing values.

You can of course filter out nan and cast float to decimal in the usual way as well (this operates on CsvProvider.Row):

data 
|> Seq.filter (fun x -> not  (Double.IsNaN(x.Income)))
|> Seq.map (fun x -> (x.Id, x.State, decimal x.Income))
//val it : seq<int * string * decimal> =seq [(40, "TX", 2000.1M); (15, "TX", 3000M)]

The data I used:

Id,State,Income
40,TX,2000.1
48,MO,#N/A
15,TX,3000
78,TN,
41,VT,

Upvotes: 1

Related Questions