Reputation: 405
Supposed there is a utf8-encoded file:
file1.txt
汉字
which binary representation is:
11100110 10110001 10001001 11100101 10101101 10010111
If I open it with an editor, which will read the bit sequence and decode it. I can see 汉字
in editor, and 汉字
will be saved in memory.
Then, now
Upvotes: 0
Views: 162
Reputation: 308001
As so often, the answer is "it depends".
Generally speaking in-memory text has to use some encoding just like on-disk text does.
But whether that encoding is the same as the on-disk one or not depends on the application.
Some might have a preferred encoding that they will represent the text in memory (such as UTF-16 or even UCS-4 if they are feeling wasteful) and others might hold it in-memory in the same encoding as used on-disk and just interpret it as necessary when rendering/searching.
There's no universal rule that requires one approach or another. Some languages/platforms have a strong preference.
For example Java uses UTF-16 for in-memory String
objects (except as an internal optimization it might sometimes use Latin-1 if the text allows it).
Upvotes: 2