Reputation: 197
Suppose I read a file line by line and save the lines to a list:
intern('abcd')
lst = []
for line in f:
lst.append(line)
and the file has five identical lines:
abcd
abcd
abcd
abcd
abcd
When the reading is completed, will there be five copies of 'abcd' in memory or just one?
Upvotes: 1
Views: 94
Reputation: 1124070
There will be 5 copies. The intern()
call returns the one copy of the string, it doesn't magically make all future strings with the same content interned.
I would not use interning for file data, however. The biggest advantage of interning strings is in performance-critical sections, where you need your dictionary lookups to be as fast as can be. Interning allows you to skip the equality test when pointer arithmetic can be used instead.
Interning has a performance penalty too; each time you call intern()
the string is tested against an internal dictionary to see if the string was already interned. This requires a hash call and 0 or more equality tests (0 if the string wasn't interned before and there are no hash collisions, more than one in case there are collisions). Calling intern()
for each and every line in a string is going to be slowed down by these operations, and only if you have a massive amount of repetition I don't think the memory gains will be all that great.
Upvotes: 2