lkessler
lkessler

Reputation: 20132

encoding.getstring Is Not Returning the String

I'm using Delphi 2009.

This works for me in all cases but one:

var
  BOMLength: integer;
  Buffer: TBytes;
  Encoding: TEncoding;
  Value: string;

SetLength(Buffer, 2048);
CurFileStream.Read(Buffer[0], 2048);

Encoding := nil;
BOMLength := TEncoding.GetBufferEncoding(Buffer, Encoding);
Value := Encoding.GetString(Buffer);

In the one case it doesn't work, the file is a small simple one and starts off with a UTF8 Byte Order Mark (BOM), i.e. hex: 'EF BB BF' and contains the following:

0 HEAD
0 @I1@ INDI
1 NAME Barthel Lee /Brenner/
2 CONT MAURICE F. WEAVER
2 CONT  When I was eleven or twelve years old, I went to Camp Marguette for a w
2 CONC eek or two in the summertime. It was operated by Catholic Charities and w
0 TRLR

After the call to CurFileStreamRead, when I inspect the value of Buffer, it contains the BOM followed by the file, with 0's filling in the rest of the 2048 characters of the Buffer. The Encoding correctly detected the UTF8 BOM and set BOMLength to 3.

However, after the Encoding.GetString statement, the value of Value is the null string: ''.

I have put a try-except block around this to try to catch any exceptions, but there are none.

The code works for 500 other files of different types, but not for this one.

Does anyone know what I can do to fix this so that the file is correctly read?

Or maybe there is something wrong with the file, but I'm not sure what's different about it, or how to identify what might be different or wrong.


Followup:

Remy's answer is correct. I have now determined that it is just small files, less than the buffer size (in my case 2048 bytes) that fail to work without setting the lengths.

As I noted, the remaining part of the buffer is filled with zero's. This must be what causes the Encoding.GetString function to fail to return a value. But when it knows when to stop, it is okay.

Upvotes: 4

Views: 1931

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 598174

GetString() returns a blank string (instead of raising an exception) if the source bytes are empty, or if it fails to decode the bytes. In your case, you are not telling GetString() to ignore the BOM or the un-filled portion of the buffer. Also, make sure that Encoding is initially nil.

var
  BOMLength: integer;
  Buffer: TBytes;
  BufLength: Integer;
  Encoding: TEncoding;
  Value: string;
begin
  SetLength(Buffer, 2048);
  BufLength := CurFileStream.Read(Buffer[0], Length(Buffer));

  Encoding := nil;
  BOMLength := TEncoding.GetBufferEncoding(Buffer, Encoding);
  Value := Encoding.GetString(Buffer, BOMLength, BufLength - BOMLength);
end;

If that still does not work then the source data most likely has an illegal byte in it.

Upvotes: 5

Related Questions