Sorting a Maple dataframe by the contents of a column

Question

I have a dataset stored in a Maple dataframe that I'd like to sort by values in a given column. My example is larger, but the data is such that I have two columns of data, one that has some numeric values, and the other that has strings. So for example, say if I have a dataframe constructed as:

Mydata := DataFrame(<<2,1,3,0>|<"Red","Blue","Green","Orange">>, columns = [Value,Color] );

I'd like something like the sort command to be able to return the same dataframe with the numbers in the Value column sorted in ascending or descending order, but the sort command doesn't seem to support dataframes. Any ideas on how I can sort this?

DSkoog · Accepted Answer

You're right that the sort command doesn't currently support DataFrames (but it should!). I've gotten around this by converting the DataFrame column (a DataSeries) to a Vector, sorting the Vector using output = permutation option and then indexing the DataFrame by the result. Using your example:

Mydata := DataFrame(<<2,1,3,0>|<"Red","Blue","Green","Orange">>, columns = [Value,Color] );
sort( convert( Mydata[Value], Vector ), output = permutation );

Which returns:

    [4, 2, 1, 3]

Indexing the original DataFrame by this result then returns the sorted DataFrame in ascending order of the Value column:

Mydata[ sort( convert( Mydata[Value], Vector ), output = permutation ), .. ];
Mydata[ [4, 2, 1, 3], .. ];

returns:

        [     Value     Color  ]
        [                      ]
        [4      0      "Orange"]
        [                      ]
        [2      1       "Blue" ]
        [                      ]
        [1      2       "Red"  ]
        [                      ]
        [3      3      "Green" ]

That said, I have needed to sort DataFrames a number of times, so I have also created a procedure that seems to work for most my data sets. This procedure uses a similar approach of using the sort command, however it doesn't require any data conversions since it works on the Maple DataFrame object itself. To do so, I need to set kernelopts(opaquemodules = false) in order to work directly with the internal DataFrame data object (you could also make a bunch of conversions to intermediate Matrices and Vectors, but this approach limits the amount of duplicate internal data being created):

DSort := proc( self::{DataFrame,DataSeries}, {ByColumn := NULL} )
    local i, opacity, orderindex;
    opacity := kernelopts('opaquemodules' = false):
    if type( self, ':-DataFrame' ) and ByColumn <> NULL then
        orderindex := sort( self[ByColumn]:-data, ':-output' = ':-permutation', _rest );
    elif type( self, ':-DataSeries' ) and ByColumn = NULL then
        orderindex := sort( self:-data, ':-output' = ':-permutation', _rest );
    else
        return self;
    end if;
    kernelopts(opaquemodules = opacity): #Set opaquemodules back to original setting
    if type( self, ':-DataFrame' ) then
        return DataFrame( self[ orderindex, .. ] );
    else
        return DataSeries( self[ orderindex ] );
    end if;
end proc:

For example:

DSort( Mydata, ByColumn=Value );

returns:

        [     Value     Color  ]
        [                      ]
        [4      0      "Orange"]
        [                      ]
        [2      1       "Blue" ]
        [                      ]
        [1      2       "Red"  ]
        [                      ]
        [3      3      "Green" ]

This also works on strings, so DSort( Mydata, ByColumn=Color ); should work.

        [     Value     Color  ]
        [                      ]
        [2      1       "Blue" ]
        [                      ]
        [3      3      "Green" ]
        [                      ]
        [4      0      "Orange"]
        [                      ]
        [1      2       "Red"  ]

In this procedure, I pass additional arguments to the sort command, which means that you can also add in the ascending or descending options, so you could do DSort( Mydata, ByColumn=Value, `>` ); to return the DataFrame in descending 'Value' order (this doesn't seem to play well with strings though).

Sorting a Maple dataframe by the contents of a column

Answers (1)

Related Questions