normanius
normanius

Reputation: 9762

How to extend a python package by binary executables?

My package is written almost entirely in python. However, some functionality is based on executables that are run within python using subprocess. If I set up the package locally, I need to first compile the corresponding C++ project (managed by CMake) and ensure that the resulting binary executables are created in the bin folder. My python scripts then can call these utilities.

My project's folder structure resembles the following one:

root_dir
- bin 
   - binary_tool1
   - binary_tool2
- cpp
   - CMakeLists.txt
   - tool1.cpp
   - tool2.cpp
- pkg_name
   - __init__.py
   - module1.py
   - module2.py
   - ...
LICENSE
README
setup.py

I now consider to create a distributable python package and to publish it via PyPi/pip. I therefore need to include the build-step of the C++ project into the packaging procedure.

So far, I create the python package (without the binary "payload") as described in this tutorial. I now wonder how to extend the packaging procedure such that the C++ binary files are distributed along with the package.

Questions:

I believe that the canonical approach to extend a pure-python package with C code is to create "binary extensions" (e.g. using distutils, or as described here). In this case, the functionality is provided by executables, and not by wrappable C/C++ functions. I would like to avoid redesigning the C++ project to create binary extensions.

Upvotes: 11

Views: 2074

Answers (1)

mwag
mwag

Reputation: 4035

I found a number of half-answers to this but nothing complete, so here goes.

Quick and easy (single-platform)

I believe you'll need to remove the dash from your package name. The rest of this answer assumes it's been replaced with an underscore.

  1. Starting with your directory structure, create a copy of bin under pkg_name (or move bin there). The reason for that is, if you do not, you will end up installing files into your python folders site-packages/pkg_name and site-packages/bin instead of having it all under site-packages/pkg_name.

Your minimal set of files needed for packaging should now be as follows:

- pkg_name/
  - __init__.py
  - module1.py
  - module2.py
  - bin/
    - binary_tool1
    - binary_tool2
- setup.py
  1. To call your binary executable from the code, use a relative path to __file__:
def run_binary_tool1(args):
    cmd = [os.path.join(os.path.dirname(__file__), 'bin', 'binary_tool1')] + args
    p = subprocess.Popen(cmd, ...)
    ...
  1. In setup.py, reference your binaries in addition to your package folder:
from setuptools import setup

setup(
    name='pkg_name',
    version='0.1.0',
    package_data={
        'pkg_name':['bin/binary_tool1','bin/binary_tool2']
    },
    packages=['pkg_name']
)
  1. Do yourself a favor and create a Makefile:
# Makefile for pkg_name python wheel

# PKG_NAME and VERSION should match what is in setup.py
PKG_NAME=pkg_name
VERSION=0.1.0

# Shouldn't need to change anything below here

# determine the target wheel file
WHEEL_TARGET=dist/${PKG_NAME}-${VERSION}-py2.py3-none-any.whl

# help
help:
    @echo "Usage: make <setup|build|install|uninstall|clean>"

# install packaging utilities (only run this once)
setup: pip install wheel setuptools

# build the wheel
build: ${WHEEL_TARGET}

# install to local python environment
install: ${WHEEL_TARGET}
    pip install ${WHEEL_TARGET}

# uninstall from local python environment
uninstall:
    pip uninstall ${PKG_NAME}

# remove all build artifacts
clean:
    @rm -rf build dist ${PKG_NAME}.egg-info
    @find . -name __pycache__ -exec rm -rf {} \; 2>/dev/null

# build the wheel
${WHEEL_TARGET}: setup.py ${PKG_NAME}/__init__.py ${PKG_NAME}/module1.py ${PKG_NAME}/module2.py ${PKG_NAME}/bin/binary_tool1 ${PKG_NAME}/bin/binary_tool2
    python setup.py bdist_wheel --universal

  1. Now you're ready to roll:
make setup  # only run once if needed
make install # runs `make build` first

## optional:
# make uninstall
# make clean

and in python:

import pkg_name
pkg_name.run_binary_tool1(...)
...

Multi-platform

You'll almost certainly want to provide more info in your setup() call, so I won't go into detail on that here. More importantly, the above creates a wheel that purports to be universal but really is not. This might be sufficient for your needs if you are sure you will only be distributing on a single platform and you don't mind this mismatch, but would not be suitable for broader distribution.

For multi-platform distribution, you could go the obvious route and create platform-specific wheels (changing the --universal flag in the above Makefile command, etc).

Alternatively, if you can compile a binary for every platform, you could packages all of the binaries for all platforms in your one universal wheel, and let your python code figure out which binary to call (for example, by checking sys.platform and/or other available variables to determine the platform details).

The advantages of this alternative approach are that the packaging process is still easy, the platform-dynamic code is some simple python, and you can easily reuse the same binary on multiple platforms provided that it actually works on those platforms. Now, I would not be surprised if the "all-binaries" approach is frowned on by at least some if not many, but hey, python users often say that developer time is king, so this approach has that argument in its favor-- as does the overall idea of packaging a binary executable instead of going through all the brain damage of creating a python/C wrapper interface.

Upvotes: 6

Related Questions