mickG
mickG

Reputation: 335

Calculate some statistics with Math.Net

I have some results that are stored in a multidimensional array:

double[,] results;

Each column is a time series of prices for a specific variable (e.g. "house", "car", "electricity"). I would like to calculate some statistics for each variable so that to summarize the results in a more compact form. For example, I was looking at the percentile function in Math.Net.

I would like to calculate the 90th percentile of the prices for each column (so for each variable).

I am trying the following, since the function doesn't work on multidimensional array (so I cannot pass results[,] as argument for the percentile function):

for (int i = 0, i <= results.GetLength(2), i++)
{
    myList.Add(MathNet.Numerics.Statistics.Statistics.Percentile(results[,i], 90));
}

So I want to loop through the columns of my results[,] and calculate the 90th percentile, adding the result to a list. But this doesn't work because of wrong syntax in results[, i]. There is no other (more clear) error message unfortunately.

Can you help me understand where the problem is and if there's a better way to calculate a percentile by column?

Upvotes: 1

Views: 2724

Answers (1)

dbc
dbc

Reputation: 116526

Percentile is an extension method with following calling sequence:

public static double Percentile(this IEnumerable<double> data, int p)

So you can use Linq to transform your 2d array into an appropriate sequence to pass to Percentile.

However, results.GetLength(2) will throw an exception because the dimension argument of GetLength() is zero-based. You probably meant results.GetLength(1). Assuming that's what you meant, you can do:

        var query = Enumerable.Range(0, results.GetLength(1))
            .Select(iCol => Enumerable.Range(0, results.GetLength(0))
                .Select(iRow => results[iRow, iCol])
                .Percentile(90));

You can have Linq make the list for you,

        var myList= query.ToList();

or add it to a pre-existing list:

        myList.AddRange(query);

update

To filter NaN values use double.IsNaN:

        var query = Enumerable.Range(0, results.GetLength(1))
            .Select(iCol => Enumerable.Range(0, results.GetLength(0))
                .Select(iRow => results[iRow, iCol])
                .Where(d => !double.IsNaN(d))
                .Percentile(90));

update

If one extracts a couple of array extensions:

public static class ArrayExtensions
{
    public static IEnumerable<IEnumerable<T>> Columns<T>(this T[,] array)
    {
        if (array == null)
            throw new ArgumentNullException();
        return Enumerable.Range(0, array.GetLength(1))
            .Select(iCol => Enumerable.Range(0, array.GetLength(0))
                .Select(iRow => array[iRow, iCol]));
    }

    public static IEnumerable<IEnumerable<T>> Rows<T>(this T[,] array)
    {
        if (array == null)
            throw new ArgumentNullException();
        return Enumerable.Range(0, array.GetLength(0))
            .Select(iRow => Enumerable.Range(0, array.GetLength(1))
                .Select(iCol => array[iRow, iCol]));
    }
}

Them the query becomes:

        var query = results.Columns().Select(col => col.Where(d => !double.IsNaN(d)).Percentile(90));

which seems much clearer.

Upvotes: 2

Related Questions