dharmatech
dharmatech

Reputation: 9527

Equivalent of `shift` from pandas

Initial DataFrame in Pandas

Let's suppose we have the following in Python with pandas:

import pandas as pd

df = pd.DataFrame({
    "Col1": [10, 20, 15, 30, 45],
    "Col2": [13, 23, 18, 33, 48],
    "Col3": [17, 27, 22, 37, 52] },
    index=pd.date_range("2020-01-01", "2020-01-05"))

df

Here's what we get in Jupyter:

enter image description here

Shifting columns

Now let's shift Col1 by 2 and store it in Col4.

We'll also store df['Col1'] / df['Col1'].shift(2) in Col5:

df_2 = df.copy(deep=True)

df_2['Col4'] = df['Col1'].shift(2)

df_2['Col5'] = df['Col1'] / df['Col1'].shift(2)

df_2

The result:

enter image description here

C# version

Now let's setup a similar DataFrame in C#:

#r "nuget:Microsoft.Data.Analysis"

using Microsoft.Data.Analysis;
var df = new DataFrame(
    new PrimitiveDataFrameColumn<DateTime>("DateTime",
        Enumerable.Range(0, 5).Select(i => new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),    
    new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
    new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
    new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);

df

The result in .NET Interactive:

enter image description here

Question

What's a good way to perform the equivalent column shifts as demonstrated in the Pandas version?

Notes

The above example is from the documentation for pandas.DataFrame.shift:

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html

Update

It does indeed look like there isn't currently a built-in shift in Microsoft.Data.Analysis. I've posted an issue for this here:

https://github.com/dotnet/machinelearning/issues/6008

Upvotes: 1

Views: 494

Answers (2)

markbeij
markbeij

Reputation: 141

@dharmatech has a great answer and it should be marked as the correct answer.

I changed the function slightly to make it generic:

using Microsoft.Data.Analysis;
PrimitiveDataFrameColumn<T> ShiftIntColumn<T>(PrimitiveDataFrameColumn<T> col, int n, string name) where T : unmanaged
{
    return 
        new PrimitiveDataFrameColumn<T>(
            name,
            Enumerable.Repeat((T?) null, n)
                .Concat(col.Select(item => (T?) item))
                .Take(col.Count()));
}

Upvotes: 1

dharmatech
dharmatech

Reputation: 9527

Helper functions

Perform a column shift.

PrimitiveDataFrameColumn<double> ShiftIntColumn(PrimitiveDataFrameColumn<int> col, int n, string name)
{
    return 
        new PrimitiveDataFrameColumn<double>(
            name,
            Enumerable.Repeat((double?) null, n)
                .Concat(col.Select(item => (double?) item))
                .Take(col.Count()));
}

Carry out division, taking care of null values in divisor.

PrimitiveDataFrameColumn<double> DivAlt3(PrimitiveDataFrameColumn<int> a, PrimitiveDataFrameColumn<double> b, string name)
{
    return 
        new PrimitiveDataFrameColumn<double>(name, a.Zip(b, (x, y) => y == null ? null : x / y));
}

Then the following:

var df = new DataFrame(
    new PrimitiveDataFrameColumn<DateTime>("DateTime",
        Enumerable.Range(0, 5).Select(i => 
            new DateTime(2020, 1, 1).Add(new TimeSpan(i, 0, 0, 0)))),    
    new PrimitiveDataFrameColumn<int>("Col1", new []{ 10, 20, 15, 30, 45 }),
    new PrimitiveDataFrameColumn<int>("Col2", new []{ 13, 23, 18, 33, 48 }),
    new PrimitiveDataFrameColumn<int>("Col3", new []{ 17, 27, 22, 37, 52 })
);

df.Columns.Add(ShiftIntColumn((PrimitiveDataFrameColumn<int>)df["Col1"], 2, "Col4"));

df.Columns.Add(DivAlt3((PrimitiveDataFrameColumn<int>) df["Col1"], (PrimitiveDataFrameColumn<double>) df["Col4"], "Col5"));

results in:

enter image description here

Complete notebook

See the following notebook for a full demonstration of the above:

https://github.com/dharmatech/dataframe-shift-example-cs/blob/003/dataframe-shift-example-cs.ipynb

Notes

  • It would be great if Microsoft.Data.Analysis came with column shift functionality.
  • It would also be great if column division handled nulls natively.

Would love to see other perhaps more idiomatic approaches to this.

Upvotes: 1

Related Questions