A9S6
A9S6

Reputation: 6685

Packing record length in 2 bytes

I want to create a ASCII string which will have a number of fields. For e.g.

string s = f1 + "|" + f2 + "|" + f3;

f1, f2, f3 are fields and "|"(pipe) is the delimiter. I want to avoid this delimiter and keep the field count at the beginning like:

string s = f1.Length + f2.Length + f3.Length + f1 + f2 + f3;

All lengths are going to be packed in 2 chars, Max length = 00-99 in this case. I was wondering if I can pack the length of each field in 2 bytes by extracting bytes out of a short. This would allow me to have a range 0-65536 using only 2 bytes. E.g.

short length = 20005;
byte b1 = (byte)length;
byte b2 = (byte)(length >> 8);
// Save bytes b1 and b2

// Read bytes b1 and b2
short length = 0;
length = b2;
length = (short)(length << 8);
length = (short)(length | b1);
// Now length is 20005

What do you think about the above code, Is this a good way to keep the record lengths?

Upvotes: 1

Views: 1101

Answers (4)

Paul Farry
Paul Farry

Reputation: 4768

If you want to get clever you could pack your 2 bytes into 1 where the value of byte 1 is <= 127, or if the value is >=128 you use 2 bytes instead. This technique looses you 1 bit, per byte that you are using, but if you normally have small values, but occasionally have larger values it dynamically grows to accommodate the value.

All you need to do is mark bit 8 with a value indicating that the 2nd byte is required to be read. If bit 8 of the active byte is not set, it means you have completed your value.

EG If you have a value of 4 then you use this

|8|7|6|5|4|3|2|1|
|0|0|0|0|0|1|0|0|

If you have a value of 128 you then can read the 1st byte check if bit 8 is high, and read the remaining 7 bits of the 1st byte, then you do the same with the 2nd byte, moving the 7bits left 7 bits.

|BYTE 0         |BYTE 1         |
|8|7|6|5|4|3|2|1|8|7|6|5|4|3|2|1|
|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1|

Upvotes: 0

Gareth McCaughan
Gareth McCaughan

Reputation: 19981

Whether it's a good idea depends on the details of what it's for, but it's not likely to be good.

If you do this then you're no longer creating an "ASCII string". Those were your words, but maybe you don't really care whether it's ASCII.

You will sometimes get bytes with a value of 0 in your "string". If you're handling the strings with anything written in C, this is likely to cause trouble. You'll also get all sorts of other characters -- newlines, tabs, commas, etc. -- that may confuse software that's trying to work with your data.

The original plan of separating with (say) | characters will be more compact and easier for humans and software to read. The only obvious downsides are (1) you can't allow field values with a | in (or else you need some sort of escaping) and (2) parsing will be marginally slower.

Upvotes: 0

Marc Gravell
Marc Gravell

Reputation: 1064244

If you want ascii, i.e. "00" as characters, then just:

byte[] bytes = Encoding.Ascii.GetBytes(length.ToString("00"));

or you could optimise it if you want.

But IMO, if you are storing 0-99, 1 byte is plenty:

byte b = (byte)length;

If you want the range 0-65535, then just:

bytes[0] = (byte)length;
bytes[1] = (byte)(length >> 8);

or swap index 0 and 1 for endianness.

But if you are using the full range (of either single or double byte), then it isn't ascii nor a string. Anything that tries to read it as a string might fail.

Upvotes: 0

Aliostad
Aliostad

Reputation: 81700

I cannot see what you are trying to achieve. short aka Int16 is 2 bytes - yes, so you can happily use it. But creating a string does not make sense.

short sh = 56100; // 2 bytes

I believe you mean, being able to output the short to a stream. For this there are ways:

  • BinaryWriter.Write(sh) which writes 2 bytes straight to the stream
  • BitConverter.GetBytes(sh) which gives you bytes of a short

Reading back you can use the same classes.

Upvotes: 1

Related Questions