Reputation: 905
I am trying to replace character (decimal value 197) in a UTF-8 file with character (decimal value 65)
I can load the file and put it in a string (may not need to do that though)
SS := TStringStream.Create(ParamStr1, TEncoding.UTF8);
SS.LoadFromFile(ParamStr1);
//S:= SS.DataString;
//ShowMessage(S);
However, how do i replace all 197's with a 65, and save it back out as UTF-8?
SS.SaveToFile(ParamStr2);
SS.Free;
-------------- EDIT ----------------
reader:= TStreamReader.Create(ParamStr1, TEncoding.UTF8);
writer:= TStreamWriter.Create(ParamStr2, False, TEncoding.UTF8);
while not Reader.EndOfStream do
begin
S:= reader.ReadLine;
for I:= 1 to Length(S) do
begin
if Ord(S[I]) = 350 then
begin
Delete(S,I,1);
Insert('A',S,I);
end;
end;
writer.Write(S + #13#10);
end;
writer.Free;
reader.Free;
Upvotes: 1
Views: 3305
Reputation: 595896
Decimal 197
is hex C5
, and decimal 65
is hex 41
.
C5
is not a valid UTF-8 octet by itself, but 41
is. So I have to assume you are actually referring to Unicode codepoints U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
and U+0041 LATIN CAPITAL LETTER A
instead.
U+00C5
is encoded in UTF-8 as C3 85
, and U+0041
is encoded as 41
. To do what you are asking, you have to decode the UTF-8, replace the codepoints, then re-encode back to UTF-8. StringReplace()
will work just fine for that, eg:
SS := TStringStream.Create('', TEncoding.UTF8);
SS.LoadFromFile(ParamStr1);
S := StringReplace(SS.DataString, 'Å', 'A', [rfReplaceAll]);
SS2 := TStringStream.Create(S, TEncoding.UTF8);
SS2.SaveToFile(ParamStr2);
SS2.Free;
SS.Free;
Or:
reader := TStreamReader.Create(ParamStr1, TEncoding.UTF8);
writer := TStreamWriter.Create(ParamStr2, False, TEncoding.UTF8);
while not Reader.EndOfStream do
begin
S := reader.ReadLine;
S := StringReplace(S, 'Å', 'A', [rfReplaceAll]);
writer.WriteLine(S);
end;
writer.Free;
reader.Free;
Update: based on other comments, it looks like you are not actually interested in Unicode codepoint U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE
, but rather in U+015E LATIN CAPITAL LETTER S WITH CEDILLA
instead, which is encoded in UTF-8 as C5 9E
. If that is true, then simply replace Å
with Ş
when calling StringReplace()
after the UTF-8 data has been decoded:
S := StringReplace(S, 'Ş', 'A', [rfReplaceAll]);
Upvotes: 7