Dave Forgac
Dave Forgac

Reputation: 3326

How can I find the module(s) provided by a given Python distribution?

I need to construct a list of modules that are provided by a list of Python distributions specified in a requirements.txt file. The distributions will first be installed so they should be available for inspection locally.

It looks like I should be able to use pip.req.parse_requirements to get the list of distributions from the requirements file. From there, how can I find the name of the module(s) that the distributions provide?

Upvotes: 4

Views: 681

Answers (2)

twooster
twooster

Reputation: 931

Since, like you said, distributions are not the modules they contain, we run into a problem: The typical install process for a distribution -- which is, afaik, a collection of packages along with an installer -- is to download, unpack, and then run setup.py, which handles the remainder of the installation process.

The upshot is that, even given a Python distribution, you cannot actually tell what setup.py will do without running it. There may be conventions, and you may be able to pull out a lot of information and formulate a lot of good guesses, but running that 'setup.py' file is really the only way to see what it actually installs into site-packages. Hence, parse_requirements, or really any of the pip internals really won't be useful for you, unless you're only interested in distributions.

So, that being said, I think the best way to manage your problem would be to:

  1. Set up a virtual environment w/o site packages
  2. pip -r requirements.txt to actually install all packages
  3. Trawl through sys.path, looking for .py, .pyc and into subfolders for __init__.py? files to build a list of modules.
  4. Kill that virtualenv and move on your way.

Step three may be doable in other, better, ways, I'm not sure. Further, you still run the risk of missing dynamically created modules or other trickiness, but this should capture the majority of modules.

Edit:

Here's some code that should work for everything but zip files:

import sys, os

def walk_modules_os(root):
    def inner_walk(dir_path, mod_path):
        filelist = os.listdir(dir_path)
        pyfiles = set()
        dirs = []
        for name in filelist:
            if os.path.isdir(os.path.join(dir_path, name)):
                dirs.append(name)
            else:
                pre, ext = os.path.splitext(name)
                if ext in ('.py', '.pyc', '.pyo'):
                    pyfiles.add(pre)

        if len(mod_path):
            if '__init__' not in pyfiles:
                return
            pyfiles.remove('__init__')
            yield mod_path

        for pyfile in pyfiles:
            yield mod_path + (pyfile,)

        for directory in dirs:
            sub = os.path.join(dir_path, directory)
            for mod in inner_walk(sub, mod_path + (directory,)):
                yield mod

    root = os.path.realpath(root)
    if not os.path.isdir(root):
        return iter([])
    return iter(inner_walk(root, tuple()))

# you could collect as a set of tuples and do set subtraction, too
for path in sys.path:
    for mod in walk_modules_os(path):
        print mod 

Edit 2:

Well, crikey. GWW has the right idea. A much better solution than mine.

Upvotes: 2

GWW
GWW

Reputation: 44093

You can probably use the built in pkgutil module if your python versions are 2.3+

For example,

import sys, pkgutil
mods = set()

#You may not need this part if you don't care about the builtin modules
print sys.builtin_module_names
for m in sys.builtin_module_names:
    if m != '__main__':
        mods.add(m)
        #mods.add(m)


for loader, name, ispkg in pkgutil.walk_packages():
    if name.find('.') == -1:
        mods.add(name)

print mods

Upvotes: 3

Related Questions