JKD
JKD

Reputation: 663

Can I support multiple versions of a python package without clients needing to change code?

I'm trying to support multiple versions of a python package without impacting client code.

Consider the following repo:

.
|-- client_code.py
`-- lib
    |-- __init__.py
    |-- foo.py
    `-- bar/baz.so

client_code.py:

from lib.foo import f
from lib.bar.baz import g
...
f()
g()

I'd like to leave client_code.py unchanged but also have access to both versions of the library. I would ideally like something like this:

lib
|-- __init__.py
|-- v1
|   |-- __init__.py
|   |-- foo.py
|   `-- bar/baz.so
`-- v2
    |-- __init__.py
    |-- foo.py
    `-- bar/baz.so

lib/__init__.py:

import os

if os.environ.get("USE_V2", "0") == "0": # Or some other runtime check
    from .v1 import *
else:
    from .v2 import *

However, the client code fails with the following error:

Traceback (most recent call last):
  File "client_code.py", line 1, in <module>
    from lib.foo import f
ImportError: No module named foo

Note that the problem doesn't have to do with __all__ since the following would also fail with the same exception:

if os.environ.get("USE_V2", "0") == "0":
    from .v1 import foo
else:
    from .v2 import foo

I feel like something like this has to be possible, but I'm having a hard time finding the right keywords to search for, so I'm asking here.

The reason for requiring this (as opposed to just having different runtime environments) is because I would like the library version being used to be user-specified at runtime (e.g., use a specific .so compiled for a given GPU architecture). I could create separate Docker images for all of the permutations, but that would be excessively cumbersome.

A more restricted version of this question was previously asked here (Support two versions of a python package without clients needing to change code). The accepted solution, however, requires a separate "mirror" library where a separate file is required for each module in lib. The sole purpose of which is to redirect to v1 or v2 based on the runtime variable.

Is it possible to have a single redirection point for all submodules nested under lib?

Any help would be much appreciated. Thanks ahead of time!

Upvotes: 2

Views: 1428

Answers (1)

Lenormju
Lenormju

Reputation: 4368

When your main script does import lib.foo, the Python import system will look iteratively in the directories in your PYTHONPATH. For each one, it will search for a lib package (a directory containing an __init__.py file) which itself contains a foo package.
But your lib library does not contain a foo package, but a v1.foo and a v2.foo packages.

A first solution :

from lib import foo

foo.hello()

It works because we do not try to import a lib.foo package, instead we use the foo object defined in the lib module (which happens to be module too, but as far as we are concerned it looks like any other Python object). And importing the lib package works fine.

But I understand that requiring your library to be used exactly this way is not very user-friendly and prone to error.

If we want to make import foo.lib to work as you want, we have to cheat a bit with the import system :

# main.py
import lib.foo

lib.foo.hello()
# lib/__init__.py
import os
import sys

if os.environ.get("USE_V2", "0") == "0":
    from .v1 import foo
else:
    from .v2 import foo

sys.modules["lib.foo"] = foo  # <---- cheating here

Here we manipulate sys.modules which is the cache for loaded modules.
The first time your Python program imports a certain package, the Python runtime will search the corresponding file on the disk, compile it to bytecode, construct the module object, store it in the cache and then bind the module object to the name you want in your context.

Put into example :

import math  # search, read, compile, create module object, store to cache, bind to "math"
print(math.pi)  # the name "math" is now defined
import math as math2  # hit the cache, bind the module object to "math2"
print(math2.pi)  # the name "math2" is now defined
print(math is math2)  # True

So we can mess with the cache ourselves, that's the line sys.modules["lib.foo"] = foo. It tells the Python runtime that for future imports, if asked for lib.foo, it should give the foo module.

This also makes one of your two packages importable from two qualified names :

import lib.foo
import lib.v1.foo
import lib.v2.foo

print(lib.foo is lib.v1.foo)  # True when USE_V2=0
print(lib.foo is lib.v2.foo)  # True when USE_V2=1

But is has side effects that may not like :

  • first it limits the IDE (in my case PyCharm) support (auto-completion, ...) because the IDE has its own internal import system, that do not find the foo.lib package. There may be ways to solve this problem, I don't know (maybe providing a type hints file ?).
  • and I will cite the sys.modules documentation :

    This [dictionnary] can be manipulated to force reloading of modules and other tricks. However, replacing the dictionary will not necessarily work as expected and deleting essential items from the dictionary may cause Python to fail.
    There are many footguns with manipulating this dictionnary.

Upvotes: 2

Related Questions