komark
komark

Reputation: 197

Should intern be called explicitly on every string occurence?

Suppose I read a file line by line and save the lines to a list:

intern('abcd')
lst = []  
for line in f:
    lst.append(line)

and the file has five identical lines:

abcd
abcd
abcd
abcd
abcd

When the reading is completed, will there be five copies of 'abcd' in memory or just one?

Upvotes: 1

Views: 94

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124070

There will be 5 copies. The intern() call returns the one copy of the string, it doesn't magically make all future strings with the same content interned.

I would not use interning for file data, however. The biggest advantage of interning strings is in performance-critical sections, where you need your dictionary lookups to be as fast as can be. Interning allows you to skip the equality test when pointer arithmetic can be used instead.

Interning has a performance penalty too; each time you call intern() the string is tested against an internal dictionary to see if the string was already interned. This requires a hash call and 0 or more equality tests (0 if the string wasn't interned before and there are no hash collisions, more than one in case there are collisions). Calling intern() for each and every line in a string is going to be slowed down by these operations, and only if you have a massive amount of repetition I don't think the memory gains will be all that great.

Upvotes: 2

Related Questions