Blow ThemUp
Blow ThemUp

Reputation: 905

Reading a text file as bytes (byte by byte) using delphi 2010

I would like to read a UTF-8 text file byte by byte and get the ascii value representation of each byte in the file. Can this be done? If so, what is the best method?

My goal is to then replace 2 byte combinations that i find with one byte (these are set conditions that I have prepared)

for example, If I find a 197 followed by a 158 (decimal representations), i will replace it with a single byte 17

I don't want to use the standard delphi IO operations

AssignFile
ReSet
ReWrite(OutFile);
ReadLn
WriteLn
CloseFile

Is there a better method? Can this be done using TStream (Reader & Writer)?

Here is an example test I am using. I know there is a character (350) (two bytes) starting in column 84. When viewed in a hex editor, the character consists of 197 + 158 - so i am trying to find the 198 using my delphi code and can't seem to find it

FS1:= TFileStream.Create(ParamStr1, fmOpenRead);
try
 FS1.Seek(0, soBeginning);
 FS1.Position:= FS1.Position + 84;
 FS1.Read(B, SizeOf(B));
 if ord(B) = 197 then showMessage('True') else ShowMessage('False');
finally
 FS1.Free;
end;

Upvotes: 0

Views: 2411

Answers (3)

Remy Lebeau
Remy Lebeau

Reputation: 595319

You asked the same question 5 hours later in another topic, the answer od which better addresses your specific question:

Replacing a unicode character in UTF-8 file using delphi 2010

Upvotes: 0

David Heffernan
David Heffernan

Reputation: 612794

My understanding is that you want to convert a text file from UTF-8 to ASCII. That's quite simple:

StringList.LoadFromFile(UTF8FileName, TEncoding.UTF8);
StringList.SaveToFile(ASCIIFileName, TEncoding.ASCII);

The runtime library comes with all sorts of functionality to convert between different text encodings. Surely you don't want to attempt to replicate this functionality yourself?

I trust you realise that this conversion is liable to lose data. Characters with ordinal greater than 127 cannot be represented in ASCII. In fact every code point that requires more than 1 octet in UTF-8 cannot be represented in ASCII.

Upvotes: 3

Nickolay Olshevsky
Nickolay Olshevsky

Reputation: 14160

You can use TFileStream to read all data from file to, for isntance, array of bytes, and later check for utf8 sequence. Also please note that utf8 sequence can contain more than 2 bytes.

And, in Delphi there is a function Utf8ToUnicode, which will convert utf8 data to usable unicode string.

Upvotes: 4

Related Questions