Reputation: 3642
I have this code which I think I found somewhere on the internet some years ago and it doesn't quite work.
The purpose is to take any string and from that create a string that is lexicographically sorted by a large number - because then inverse (descending) ordering can be achieved by subtracting the number from another even larger number.
private static BigInteger maxSort = new BigInteger(Encoding.Unicode.GetBytes("5335522543087813528200259404529154678271640415603227881439560533607051111046319775598721171814499900"));
public static string GetSortString(string str, bool descending)
{
var sortNumber = new BigInteger(Encoding.Unicode.GetBytes(str));
if (descending)
{
sortNumber = maxSort - sortNumber;
}
return "$SORT!" + sortNumber.ToString().PadLeft(100, '0') + ":" + str;
}
The reason I need this is because I want to use it to insert as RowKey in Azure Table Storage which is the only way to sort in Table Storage. I need to sort any text, any number and any date, both ascending and descending.
Can anyone see the issue with the code or have any code that serves the same purpose?
The question is tagged with C# but of course this is not a question of syntax so if you have the answer in any other code that would be fine too.
Example
I want to convert any string to a number which is lexicographically sorted correctly - because if it's a number, then I can invert it and sort descending.
So for example, if I can convert:
ABBA to 1234
Beatles to 3131
ZZ Top to 9584
Then those numbers would sort them correctly ... and, if I subtract them from a large number, I would be able to invert the sort order:
10000 - 1234 = 8766
10000 - 3131 = 6869
10000 - 9584 = 0416
Of course, to support longer text input, I need to subtract them from a very large number, which is why I use the very large BigInteger.
Current output from this code
ABBA: $SORT!0000000000000000000000018296156958359617:ABBA
Beatles: $SORT!0000000009111360792640460912278748069954:Beatles
ZZ TOP: $SORT!0000000000000096715522885596192519618650:ZZ TOP
As you can see, the longest text gets the highest number. I have also tried to add padding immediately on the input str
, but that didnt help either.
Answer
The accepted answer worked. For descending sort order, the "BigInteger" trick from above could be used.
There is some limitation as to how long the sortable string can be.
Here is the final code:
private static BigInteger maxSort = new BigInteger(Encoding.Unicode.GetBytes("5335522543087813528200259404529154678271640415603227881439560533607051111046319775598721171814499900"));
public static string GetSortString(string str, bool descending)
{
BigInteger result = 0;
int maxLength = 42;
foreach (var c in str.ToCharArray())
{
result = result * 256 + c;
}
for (int i = str.Length; i < maxLength; i++)
{
result = result * 256;
}
if (descending)
{
result = maxSort - result;
}
return "$SORT!" + result;
}
Upvotes: 0
Views: 405
Reputation: 16109
If you were looking for a way to give a a value to any string so that you could sort them accordingly to the number and get the same result as above you can't. The reason is that strings don't have any length limit. Because you can always add a char to a string and thereby get a larger number even through it should have a lower lexicographical value.
If they have a length limit you can do something like this
pseudo code
bignum res = 0;
maxLength = 42;
for (char c : string)
res = res * 256 + c
for (int i = string.length; i < maxLength; i++)
res = res *256
If you want to optimize a bit, the last loop could be a lookup table. If your only using a-z, the times 256 could reduced to 26 or 32.
Upvotes: 1