mljrg
mljrg

Reputation: 4620

Length and size of strings in Elixir / Erlang needs explanation

Can someone explain why s is a string with 4096 chars

iex(9)> s = String.duplicate("x", 4096)
... lots of "x"
iex(10)> String.length(s)
4096

but its memory size are a few 6 words?

iex(11)> :erts_debug.size(s)
6 # WHAT?!

And why s2 is a much shorter string than s

iex(13)> s2 = "1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
iex(14)> String.length(s)
50

but its size has more 3 words than s?

iex(15)> :erts_debug.size(s2)
9 # WHAT!?

And why does the size of these strings does not match their lengths?

Thanks

Upvotes: 2

Views: 1182

Answers (1)

Hauleth
Hauleth

Reputation: 23586

First clue why this is showing that values can be found in this question. Quoting size/1 docs:

%% size(Term)
%%  Returns the size of Term in actual heap words. Shared subterms are
%%  counted once.  Example: If A = [a,b], B =[A,A] then size(B) returns 8,
%%  while flat_size(B) returns 12.

Second clue can be found in Erlang documentation about bitstrings implementation.


So in the first case the string is too big to fit on heap alone, so it uses refc binaries which are stored on stack and on heap there is only pointer to given binary.

In second case string is shorter than 64 bytes and it uses heap binaries which is just array of bytes stored directly in on the heap, so that gives us 8 bytes per word (64-bit) * 9 = 72 and when we check documentation about exact memory overhead in VM we see that Erlang uses 3..6 words per binary + data, where data can be shared.

Upvotes: 3

Related Questions