Reputation: 25
I have a dataset stored in a Maple dataframe that I'd like to sort by values in a given column. My example is larger, but the data is such that I have two columns of data, one that has some numeric values, and the other that has strings. So for example, say if I have a dataframe constructed as:
Mydata := DataFrame(<<2,1,3,0>|<"Red","Blue","Green","Orange">>, columns = [Value,Color] );
I'd like something like the sort command to be able to return the same dataframe with the numbers in the Value column sorted in ascending or descending order, but the sort command doesn't seem to support dataframes. Any ideas on how I can sort this?
Upvotes: 1
Views: 313
Reputation: 657
You're right that the sort command doesn't currently support DataFrames (but it should!). I've gotten around this by converting the DataFrame column (a DataSeries) to a Vector, sorting the Vector using output = permutation
option and then indexing the DataFrame by the result. Using your example:
Mydata := DataFrame(<<2,1,3,0>|<"Red","Blue","Green","Orange">>, columns = [Value,Color] );
sort( convert( Mydata[Value], Vector ), output = permutation );
Which returns:
[4, 2, 1, 3]
Indexing the original DataFrame by this result then returns the sorted DataFrame in ascending order of the Value column:
Mydata[ sort( convert( Mydata[Value], Vector ), output = permutation ), .. ];
Mydata[ [4, 2, 1, 3], .. ];
returns:
[ Value Color ]
[ ]
[4 0 "Orange"]
[ ]
[2 1 "Blue" ]
[ ]
[1 2 "Red" ]
[ ]
[3 3 "Green" ]
That said, I have needed to sort DataFrames a number of times, so I have also created a procedure that seems to work for most my data sets. This procedure uses a similar approach of using the sort command, however it doesn't require any data conversions since it works on the Maple DataFrame object itself. To do so, I need to set kernelopts(opaquemodules = false)
in order to work directly with the internal DataFrame data object (you could also make a bunch of conversions to intermediate Matrices and Vectors, but this approach limits the amount of duplicate internal data being created):
DSort := proc( self::{DataFrame,DataSeries}, {ByColumn := NULL} )
local i, opacity, orderindex;
opacity := kernelopts('opaquemodules' = false):
if type( self, ':-DataFrame' ) and ByColumn <> NULL then
orderindex := sort( self[ByColumn]:-data, ':-output' = ':-permutation', _rest );
elif type( self, ':-DataSeries' ) and ByColumn = NULL then
orderindex := sort( self:-data, ':-output' = ':-permutation', _rest );
else
return self;
end if;
kernelopts(opaquemodules = opacity): #Set opaquemodules back to original setting
if type( self, ':-DataFrame' ) then
return DataFrame( self[ orderindex, .. ] );
else
return DataSeries( self[ orderindex ] );
end if;
end proc:
For example:
DSort( Mydata, ByColumn=Value );
returns:
[ Value Color ]
[ ]
[4 0 "Orange"]
[ ]
[2 1 "Blue" ]
[ ]
[1 2 "Red" ]
[ ]
[3 3 "Green" ]
This also works on strings, so DSort( Mydata, ByColumn=Color );
should work.
[ Value Color ]
[ ]
[2 1 "Blue" ]
[ ]
[3 3 "Green" ]
[ ]
[4 0 "Orange"]
[ ]
[1 2 "Red" ]
In this procedure, I pass additional arguments to the sort
command, which means that you can also add in the ascending or descending options, so you could do DSort( Mydata, ByColumn=Value, `>` );
to return the DataFrame in descending 'Value' order (this doesn't seem to play well with strings though).
Upvotes: 3