Reputation: 55
I know it's a very dumb question but unfortunately couldn't figure it out on my own. I always have a confusion when it comes to encoding and character set topics. I'll explain what I understand from the topic then I'll ask my questions.
when you want to save a file, you do it in a certain character encoding, meaning that each character of the file fits in memory according to its encoding. right?
for example if a html
file has utf-16
encoding, does that means that browser uses utf-16
encoding to decode the given file to read the source code?
does using charset
attribute in meta
element defines what encoding the language(html
) should use to properly display characters in browser?
and html
added an "html character reference"on its own and it has nothing to do with unicode character codes?
Edit1:
so after the @snakecharmerb I realized some of my mistakes:
1- I didn't know that there is no metadata about [text]files encoding.
2- the charset
attribute tell the browser the encoding of the file because this information can't be conceived from file itself(to some extent it can. see this answer)
3- a text file can only have one encoding and if a file encoded with utf-8 it means it follows Unicode Character Set(UCS). you can't use utf-8 encoding with another character set and today the terms utf-8 and unicode are almost interchangeable.
Upvotes: 1
Views: 92
Reputation: 55699
when you want to save a file, you do it in a certain character encoding, meaning that each character of the file fits in memory according to its encoding. right?
for example if a html file has utf-16 encoding, does that means that browser uses utf-16 encoding to decode the given file to read the source code?
the browser will attempt to decode the page using the encoding provided in the Content-Type
header in the response headers from the web server; if the header is missing or does not specify an encoding, the meta charset
tag in the page will be used. If neither is specified, the browser may attempt to infer the encoding from the document content, and finally fallback to latin-1
the w3c recommends always setting the meta tag, only setting the Content-Type header if you are sure it will be correct, and always using UTF-8 as your encoding.
does using charset attribute in meta element defines what encoding the language(html) should use to properly display characters in browser?
and html added an "html character reference"on its own and it has nothing to do with unicode character codes?
'
or '
) are independent of any particular encoding, but their constituent characters will themselves will be encoded and decodedUpvotes: 1