S. Morgan
S. Morgan

Reputation: 11

Converting C strings to Pascal strings

When converting a C string into a Pascal string, why should the length of the original string be less or equal to 127 instead of 256? I understand that an unsigned int ranges from 0~256 and a signed one ranges from -128~127, but isn't the first character of a Pascal string unsigned?

Upvotes: 1

Views: 882

Answers (2)

Rudy Velthuis
Rudy Velthuis

Reputation: 28846

The Pascal string you are referring to is probably the one used in older Pascals (called ShortString in e.g. Delphi and FreePascal, the most popular Pascal implementations these days). That can contain 255 single-byte characters (char in C). There is no need to restrict this to 127 characters.

Perhaps you were thinking of the fact that 255 bytes can only contain 127 UTF-16 code points. But these strings were popular in the old CP/M and DOS days, when no one knew anything about Unicode yet, and were made to contain ASCII or "Extended ASCII" (8 bit, using code pages).

But most modern Pascal implementations allow you to use strings up to 2 GB in size. There, the length indicator is not stored as the first element anymore, just close to the text data. And these days, most of these strings can contain Unicode too, either as UTF-16 or as UTF-8, depending on the string type you choose (modern Pascal implementations have several different string types for different purposes, so there is not one single "Pascal string type" anymore).

Some languages do have the ability to restrict the size of a ShortString, as so called "counted" strings:

var
  s: string[18];

That string has a maximum of 18 bytes text data and 1 byte length data (at index 0). Such shorter strings can be used in, say, records, so they don't grow too big.

Upvotes: 2

Dai
Dai

Reputation: 155708

FreePascal's wiki has a great page showing all the types of strings that Pascal (at least that implementation) supports: http://wiki.freepascal.org/Character_and_string_types - it includes length-prefixed and null-terminated string types. None of the types on that page have a length restriction of 127.

The string type you're referring to would match ShortString which has a single byte prefix, however their documentation states it accepts 0-255.

I am aware of a string-type that has a variable-length-integer prefix, which would restrict the length of the string to 127 characters if you want the in-memory representation to be binary-compatible with ShortString, as being 128 characters or longer would set the MSB bit to 1 which in variable-length-integers means the integer is at least 2 bytes long instead of 1 byte.

Upvotes: 1

Related Questions