Avrohom Yisroel
Avrohom Yisroel

Reputation: 9440

Is there a more efficient way of compressing strings of digits when we know something about the structure of the data?

We have loyalty cards (like credit/debit cards, but processed by our bespoke code, as opposed to ones processed by interfacing with the banks). We need to store transaction data on the cards, as many transactions will be made using offline devices, and only uploaded when the card is next tapped on an online terminal.

Card storage space if limited (typically max 8Kb unless you pay silly prices for very smart cards), so I need to compress the data as much as possible.

Our transaction data is made up of three parts, all of which involve digits only (ie not alphabetic or special characters)...

Representing this as a string of digits gives 37 digits per transaction.

I tried using the algorithms in System.IO.Compression (following the code in this blog post, and the accompanying GitHub repo, not included here as it's bog-standard usage of the classes).

This gave some quite impressive results, with around 72% reduction using the optimal Gzip algorithm.

However, I was wondering if it would be possible to improve on this, given that we know something about the shape of the transaction data. For example, the date/time part of the data breaks down as follows...

Anyone any comment of whether or not these restrictions would help help me improve on this compression. Thanks

Upvotes: 4

Views: 180

Answers (3)

Dmitrii Bychenko
Dmitrii Bychenko

Reputation: 186668

We can compress the data into 118 bit (or 15 bytes). So far so good we have ranges:

  • Date and Time: 1 Jan 2000 0:0:0.000 up to 1 Jan 2100 0:0:0.000 which is 3_155_760_000_000 milliseconds
  • Serial number: 1_000_000_000_000_000_000 possible numbers
  • Amount: 1_000_00 in pennies

So we have in total:

double dt = (new DateTime(2100, 1, 1) - new DateTime(2000, 1, 1)).TotalMilliseconds;
double sn = 1_000_000_000_000_000_000L;
double amount = 1_000_00;

Console.Write(Math.Log2(dt * sn * amount));

The result is 117.925470... bits, 118 bits since we can't use bit partially

Edit: Compress and decompress routine:

private static byte[] MyCompress(DateTime date, long serial, decimal amount) {
  BigInteger ms = (long)(date - new DateTime(2000, 1, 1)).TotalMilliseconds;

  BigInteger value = 
    ms * 1_000_000_000_000_000_000L * 1_000_00 +
    (BigInteger)serial * 1_000_00 +
    (BigInteger)(amount * 100);

  byte[] result = new byte[15];

  for (int i = result.Length - 1; i >= 0; --i, value /= 256) 
    result[i] = (byte)(value % 256);

  return result;
}

private static (DateTime date, long serial, decimal amount) MyDecomress(byte[] data) {
  BigInteger value = data.Aggregate(BigInteger.Zero, (s, a) => s * 256 + a);

  BigInteger amount = value % 1_000_00;
  BigInteger serial = (value / 1_000_00) % 1_000_000_000_000_000_000L;
  BigInteger dt = value / 1_000_00 / 1_000_000_000_000_000_000L;

  return (
    new DateTime(2000, 1, 1).AddMilliseconds((double)dt),
    (long)serial,
    (decimal)amount / 100M
  );
}

Demo:

var data = MyCompress(new DateTime(2023, 1, 25, 21, 06, 45), 12345, 345.87m);

Console.WriteLine(string.Join(" ", data.Select(b => b.ToString("X2"))));

var back = MyDecomress(data);

Console.Write(back);

Output:

00 0E 05 4C 23 D7 34 A8 BD E8 F7 CC 3D 95 80 BB
(25.01.2023 21:06:45, 12345, 345.87)

Fiddle

Edit: If we can store date and time up to 1/10 second (not up to millsecond) we can use 14 bytes only:

private static byte[] MyCompress(DateTime date, long serial, decimal amount) {
  BigInteger ms = (long)(date - new DateTime(2000, 1, 1)).TotalMilliseconds / 100;

  BigInteger value = 
    ms * 1_000_000_000_000_000_000L * 1_000_00 +
    (BigInteger)serial * 1_000_00 +
    (BigInteger)(amount * 100);

  byte[] result = new byte[14];

  for (int i = result.Length - 1; i >= 0; --i, value /= 256) 
    result[i] = (byte)(value % 256);

  return result;
}

private static (DateTime date, long serial, decimal amount) MyDecomress(byte[] data) {
  BigInteger value = data.Aggregate(BigInteger.Zero, (s, a) => s * 256 + a);

  BigInteger amount = value % 1_000_00;
  BigInteger serial = (value / 1_000_00) % 1_000_000_000_000_000_000L;
  BigInteger dt = value / 1_000_00 / 1_000_000_000_000_000_000L;

  return (
    new DateTime(2000, 1, 1).AddMilliseconds((double)dt * 100),
    (long)serial,
    (decimal)amount / 100M
  );
}

Upvotes: 8

Kerk Dovan
Kerk Dovan

Reputation: 51

Solution #1 (old, 16 bytes):

You can save two digits (bytes) by using the mentioned restrictions:

  1. Combine month+day into dayOfYear (000-365) (for consistency assume there are always 29 days in February);
  2. Combine hours+minutes+seconds into timeInSeconds (00000-86399).

Note, that there are may be some other technics you could use to reduce the size of the string.

After this you can convert the number in the string from base 10 to base 256. Thus you get 16 bytes instead of 37. No mathematical proof, just practical result in the code by link (output at the bottom of the page). https://ideone.com/SMKb6S

Results:

initial: 39 999912312359599999999999999999999999999
base10: 37 9999365863999999999999999999999999999
base256: 16 [7, 133, 206, 204, 233, 237, 90, 213, 156, 154, 224, 34, 63, 255, 255, 255]
base62: 21 EC5zRr0FV71hggqe73b0J

And after this you can try some compression methods. However, as noted in comments, it may not work with small amount of data.

Solution #2 (15 bytes):

Actually, you can end up with 15 bytes. Dmitry Bychenko in his answer used microseconds instead of milliseconds (I don't have enough reputation to point that out in comment). Fixed. So, 128 years will be 4_047_667_200_000 milliseconds (or something like that).

All the data fits in 15 bytes, and some bits are even left free. You can use them to increase the maximum amount, for example. Here are calculations in Python: https://ideone.com/37Bie3

Results:

Target bytes: 15 (120 bits)
Years: 64
  Total bits: 120
  Max amount: £41943.04 (22 bits, 5 free bits used)
Years: 128
  Total bits: 120
  Max amount: £20971.52 (21 bits, 4 free bits used)
Years: 256
  Total bits: 120
  Max amount: £10485.76 (20 bits, 3 free bits used)
Years: 512
  Total bits: 120
  Max amount: £5242.88 (19 bits, 2 free bits used)
Years: 1024
  Total bits: 120
  Max amount: £2621.44 (18 bits, 1 free bits used)
Years: 2048
  Total bits: 120
  Max amount: £1310.72 (17 bits, 0 free bits used)

Edit: perform some formatting to the solution #1, add solution #2.

Upvotes: 5

Jeffrey Parks
Jeffrey Parks

Reputation: 554

Instead of trying to compress the text version of the data, consider your data and store it in a more efficient format.

A date can be stored in seconds since EPOCH time (EDIT) ticks of a DateTime object which should take 8 bytes (unsigned long).

Your device serial number can be stored in an unsigned long as well, and if there are any leading 0s they can be assumed if its always a fixed 17 digits.

Your amount can be stored in an unsigned int in the range 0 to 99999 and assume the last two digits are after a decimal point.

This gives you a total size of 8 + 8 + 4 = 20 bytes.

Upvotes: 0

Related Questions