Reputation: 34099
When I write this in the iex
iex> 'hełło'
It shows me the code point
[104, 101, 322, 322, 111]
I know because single quoted strings represents char lists. But when I type the list with above numbers into iex, it shows me a list with numbers but instead hełło
iex(13)> [104, 101, 322, 322, 111]
[104, 101, 322, 322, 111]
Why it does not show me characters?
When I type
iex(3)> a = [67,55,44]
into iex, I've got following characters
'C7,'
What when I want that iex shows me numbers instead of characters?
Why can I here pass an atom as arity?
iex> to_string :hello
"hello"
Upvotes: 1
Views: 1406
Reputation: 54714
When you inspect a list (or look at the return value in iex), Elixir will check if the list contains only valid codepoints. If that is the case, it will print the list in its string representation, otherwise it will be printed as list. Char lists are also just lists of integers, so the same rules apply for those. Observe some of these examples to see that char lists are really just lists:
# '' for example is the same as an empty list []
iex> ''
[]
# a char list with valid codepoints will be printed as string
iex> 'A'
'A'
# a char list with invalid codepoints will be printed as list
iex> 'A' ++ [0]
[65, 0]
# a list with only valid codepoints will also be printed as string
iex> [65]
'A'
That means char lists are nothing special, just lists of integers. Now it so happens that char lists can't handle UTF8 characters. They're mainly there for Erlang interoperability, because we need a way to convert "Erlang strings" back and forth. If I'm not mistaken, Erlang doesn't know how to handle UTF8 in char lists as well so it might be implemented this way for historical reasons.
However, Elixir is nice enough to convert UTF8 characters in a char list literal to the appropriate code points, so you can later convert it to a binary and get the proper UTF8 characters:
# UTF8 codepoints are not valid for char lists
iex> 'hełło'
[104, 101, 322, 322, 111]
# however you can convert a list with UTF8 codepoints to a binary
iex> to_string('hełło')
"hełło"
Iex internally uses the inspect protocol to print return values. You can however pass additional options if you call inspect manually. For example to see the codepoints of a char list:
iex> IO.puts inspect('hello', char_lists: false)
[104, 101, 108, 108, 111]
:ok
And if you want to see the codepoints of a binary:
iex> IO.puts inspect("hello", binaries: :as_binaries)
<<104, 101, 108, 108, 111>>
:ok
For more options, check h Inspect.Opts
in iex. This technique enables us to clearly see the difference between UTF8 codepoints in char lists and binaries. The difference is that char lists represent one character as one integer, whereas binaries correctly store UTF8 codepoints as multiple bytes:
iex> IO.puts inspect('ł', char_lists: false)
[322]
:ok
iex> IO.puts inspect("ł", binaries: :as_binaries)
<<197, 130>>
:ok
That said, you should really, really use binaries instead of char lists if you stay within Elixir. Char lists are generally only useful to interact with Erlang code that uses them.
Upvotes: 6