In Python runtime, is there a way to distinguish literal string instances from dynamically created ones?

Question

For example, I want to be able to tell the difference between these two values:

val1 = "Foo"
var2 = "%s" % "Foo"

An example use case for this check is to protect a string.Template-like function from any attacks, like exposing value of local variables.

If it's not possible, is there any good reason for it?

And a side note...

PEP 498 -- Literal String Interpolation introduces f-strings, which are string literals which may split into literals and expressions at tokenization time.

F-strings work fairly similar to string.Template(), but has the enforcement of the input being a literal string, at the cost of syntax update for the language.

If this kind of check has been available on runtime, f-strings could have been implemented as a function.

Update 1

As noted by @kevin in his answer, CPython has optimizations that allows it to reuse existing instances when there's no need to create new ones. In my first example, "%s" % "Foo" is skipped with just linking to existing "Foo" instance.

But that's not a language requirement, and in fact doesn't always happen. Any string formatting other than some obvious ones would result in creation of a new instance.

In the following example, you can see that although the strings are equal by value, they are not the same object. Using sys.intern() would give us the same instance, though.

In [1]: import dis
   ...: import sys
   ...:
   ...: def foo():
   ...:     var1 = "Foo Bar"
   ...:     var2 = "%s %s" % ("Foo", "Bar")
   ...:     print(f'plain eq: {var1 == var2}')
   ...:     print(f'plain is: {var1 is var2}')
   ...:     print(f'intern is: {sys.intern(var1) is sys.intern(var2)}')
   ...:
   ...: dis.dis(foo)
   ...: foo()
   ...:
  5           0 LOAD_CONST               1 ('Foo Bar')
              2 STORE_FAST               0 (var1)

  6           4 LOAD_CONST               9 ('Foo Bar')
              6 STORE_FAST               1 (var2)

  7           8 LOAD_GLOBAL              0 (print)
             10 LOAD_CONST               5 ('plain eq: ')
             12 LOAD_FAST                0 (var1)
             14 LOAD_FAST                1 (var2)
             16 COMPARE_OP               2 (==)
             18 FORMAT_VALUE             0
             20 BUILD_STRING             2
             22 CALL_FUNCTION            1
             24 POP_TOP

  8          26 LOAD_GLOBAL              0 (print)
             28 LOAD_CONST               6 ('plain is: ')
             30 LOAD_FAST                0 (var1)
             32 LOAD_FAST                1 (var2)
             34 COMPARE_OP               8 (is)
             36 FORMAT_VALUE             0
             38 BUILD_STRING             2
             40 CALL_FUNCTION            1
             42 POP_TOP

  9          44 LOAD_GLOBAL              0 (print)
             46 LOAD_CONST               7 ('intern is: ')
             48 LOAD_GLOBAL              1 (sys)
             50 LOAD_ATTR                2 (intern)
             52 LOAD_FAST                0 (var1)
             54 CALL_FUNCTION            1
             56 LOAD_GLOBAL              1 (sys)
             58 LOAD_ATTR                2 (intern)
             60 LOAD_FAST                1 (var2)
             62 CALL_FUNCTION            1
             64 COMPARE_OP               8 (is)
             66 FORMAT_VALUE             0
             68 BUILD_STRING             2
             70 CALL_FUNCTION            1
             72 POP_TOP
             74 LOAD_CONST               0 (None)
             76 RETURN_VALUE
plain eq: True
plain is: False
intern is: True

As documented in sys.intern(), "Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys." In other words, normally, runtime string instances are not interned.

In Python runtime, is there a way to distinguish literal string instances from dynamically created ones?

Update 1

Answers (1)

Related Questions