user1009073
user1009073

Reputation: 3238

Delphi - Removing specific hex value from string

Delphi Tokyo - I have a text file... (Specifically CSV file). I am reading the file line by line using TextFile operations... The first three bytes of the file has some type of header data which I am not interested in. While I think this will be the case in all files, I want to verify that before I delete it. In short, I want to read the line, compare the first three bytes to three hex values, and if matching, delete the 3 bytes.

When I look at the file in a hex editor, I see

EF BB BF ...

For whatever reason, my comparison is NOT working. Here is a code fragment.

var
LeadingBadBytes: String;
begin

 // Open file, and read first line into variable TriggerHeader
 ...
 LeadingBadBytes := '$EFBBBF';
 if AnsiPos(LeadingBadBytes, TriggerHeader) = 1 then    
   delete(TriggerHeader, 1, 3);

The DELETE command by itself works fine, but I cannot get the AnsiPos to work. What should I be doing different?

Upvotes: 0

Views: 1040

Answers (1)

Remy Lebeau
Remy Lebeau

Reputation: 595827

The bytes EF BB BF are a UTF-8 BOM, which identifies the file as Unicode text encoded in UTF-8. They only appear at the beginning of the file, not on every line.

Your comparison does not work because you are comparing the read string to the literal string '$EFBBBF', not to the byte sequence EF BB BF.

Change this:

LeadingBadBytes := '$EFBBBF';
...
Delete(TriggerHeader, 1, 3);

To this:

LeadingBadBytes := #$FEFF; // EF BB BF is the UTF-8 encoded form of Unicode codepoint U+FEFF...
...
Delete(TriggerHeader, 1, 1); // or Delete(..., Length(LeadingBadBytes))

Also, consider using StrUtils.StartsText(...) instead of AnsiPos(...) = 1.

That being said, modern versions of Delphi should be handling the BOM for you, you shouldn't be receiving it in the read data at all. But, since you said you are using a TextFile, it is not BOM-aware, AFAIK. You should not be using outdated Pascal-style file I/O to begin with. Try using more modern Delphi RTL I/O classes instead, like TStringList or TStreamReader, which are BOM-aware.

Upvotes: 7

Related Questions