jim108dev
jim108dev

Reputation: 101

Rearrange rows of arbitrary data and columns in Deedle

I would like to rearrange row values to columns. In Rearrange rows and columns in Deedle this was done via Frame.pivotTable. The targeted value was of type int such that the series could be aggregated by Sum(). How to pivot arbitrary types like strings or dates and preserve missing values?

For example

open Deedle

let str = "Key;Attribute;Value\nK1;A;01.01.2000\nK1;B;\nK2;A;02.02.2020\nK2;B;03.03.2030"
let bytes = System.Text.Encoding.UTF8.GetBytes str
let stream =  new MemoryStream( bytes )

let df =
    Frame.ReadCsv(
        stream = stream,
        separators = ";",
        hasHeaders = true
    )

df.Print(true)
     Key      Attribute Value                
     (string) (string)  (DateTime)           
0 -> K1       A         1/1/2000 12:00:00 AM 
1 -> K1       B         <missing>            
2 -> K2       A         2/2/2020 12:00:00 AM 
3 -> K2       B         3/3/2030 12:00:00 AM 
let df1 = 
    df
    |> Frame.pivotTable 
        (fun k r -> r.GetAs<string>("Key")) 
        (fun k r -> r.GetAs<string>("Attribute")) 
        (fun frm -> frm.GetColumn("Value").Sum())  //<-- Different aggregation needed

df1.Print(true)

Result should be

      A                     B                                   
      (DateTime)            (DateTime)                          
K1 -> 1/1/2000 12:00:00 AM  <missing>            
K2 -> 2/2/2020 12:00:00 AM  3/3/2030 12:00:00 AM                      

Upvotes: 1

Views: 83

Answers (1)

Brian Berns
Brian Berns

Reputation: 17133

I think there are two problems here. One is that you need an aggregation function that works with DateTime, and the other is that you need a way to handle missing values in the input. (The original SO question you linked to doesn't have the second issue.)

The first problem is easy to deal with. For example, TryMin will return an Option<DateTime> instead of a float:

let df1 = 
    df
    |> Frame.pivotTable 
        (fun k r -> r.GetAs<string>("Key")) 
        (fun k r -> r.GetAs<string>("Attribute")) 
        (fun frm ->
            frm.GetColumn("Value").TryMin()
                |> Option.toNullable<DateTime>)

Result is:

      A                    B
      (Nullable`1)         (Nullable`1)
K1 -> 1/1/2000 12:00:00 AM <missing>
K2 -> 2/2/2020 12:00:00 AM 3/3/2030 12:00:00 AM

This isn't quite what you want, though, because the values are now of type Nullable<DateTime> instead of just DateTime. Unfortunately, I don't know of a way around this using pivotTable, but someone else who knows more about Deedle might be able to help.

Upvotes: 0

Related Questions