casbby
casbby

Reputation: 896

Immutible or not? Deedle frame filtering

this question might look a little trivial, it does happen in our process as the data is not clean. I have a data frame looks like

let tt = Series.ofObservations[ 1=>10.0; 3=>20.0;5=> 30.0; 6=> 40.0; ]
let tt2 = Series.ofObservations[1=>  Double.NaN; 3=> 5.5; 6=>Double.NaN  ]
let tt3 = Series.ofObservations[1=> "aaa"; 3=> "bb"; 6=>"ccc" ]
let f1 = frame ["cola" => tt; "colb"=>tt2;]
f1.AddColumn("colc", tt3)

 f1.Print();;
     cola colb      colc      
1 -> 10   <missing> aaa       
3 -> 20   5.5       bb        
5 -> 30   <missing> <missing> 
6 -> 40   <missing> ccc   

I need to filter out any row until the first row with a value in colb

     cola colb      colc      
3 -> 20   5.5       bb        
5 -> 30   <missing> <missing> 
6 -> 40   <missing> ccc

The only solution i can come up with is utilising a mutable flag which breaks the integrity of functional programming. maybe a this filtering missing head can be hidden in a library. but it still makes me wonder if i did not do it the right way.

let flag = ref false
let filteredF1 = f1 |> Frame.filterRows(fun k v -> 
                                  match !flag, v.TryGetAs<float>("colb") with 
                                  | false, OptionalValue.Missing -> flag := false
                                  | false, _ -> flag := true
                                  | true, _ -> ()
                                  !flag
                                  ) 

This is not really a problem of Deedle but more to do with how should immutability achieve this. Something easily achievable in Python and VBA seems to be very hard to do in F#.

In statistic calculation situation like this happens where multiple serieses have a different starting time. And after the starting point (retaining) the data point containing the missing value is important as missing value means something.

Any advice is appreciated. cassby

Upvotes: 0

Views: 144

Answers (2)

Adam Klein
Adam Klein

Reputation: 476

Here is my preferred way:

// find first index having non-null value in column b
let idx = 
  f1?colb 
  |> Series.observationsAll 
  |> Seq.skipWhile (function | (_, None) -> true | _ -> false) 
  |> Seq.head 
  |> fst;;

// slice frame
f1.Rows.[idx .. ];;

Upvotes: 1

nodakai
nodakai

Reputation: 8033

If you wrap your code into a function (I modified it a little, but have not tested it at all!!)

let dropTil1stNonMissingB frame =
  let flag = ref false
  let kernel k v ->
    flag := !flag || v.TryGetAs<float>("colb").HasValue
    !flag
  Frame.filterRows kernel frame

then your code just looks purely functional:

let filteredF1 = f1 |> dropTil1stnonMissingB

As long as the use of reference is restricted to a narrow scope, it should be accepted. Immutability is not the final goal of functional programming. It's only a guiding principle to write a good code.

In fact the Deedle developers should have provided their version of Seq.fold for Frame:

Then you could have used it with (new Frame([],[]), false) as the initial 'State. Roughly speaking, you should be able to translate any loops in C, Python or whatever imperative language to fold (aka fold_left or foldl), though it isn't necessarily the way to go.

You might as well define it as an extension method of Frame.

type Frame with
  member frame.DropTil1stNonMissingB =
    ...

let filteredF1 = f1.DropTil1stNonMissingB

Upvotes: 0

Related Questions