Reputation: 15295
I have a Python extension which uses CPU-specific features,
if available. This is done through a run-time check. If the
hardware supports the POPCNT
instruction then it selects one
implementation of my inner loop, if SSSE3 is available then
it selects another, otherwise it falls back to generic versions
of my performance critical kernel. (Some 95%+ of the time is
spent in this kernel.)
Unfortunately, there's a failure mode I didn't expect. I
use -mssse3
and -O3
to compile all of the C code, even though
only one file needs that -mssse3
option. As a result, the other files are compiled with the expectation that SSSE3 will exist. This causes a segfault for the line:
start_target_popcount = (int)(query_popcount * threshold);
because the compiler used fisttpl
, which is an SSSE3 instruction.
After all, I told it to assume that SSSE3 exists.
The Debian packager for my package recently ran into this problem,
because the test machine has a GCC which understands -mssse3
and
generates code with that in mind, but the machine itself has an
older CPU without those instructions.
I want a solution where the same binary can work on older machines and on newer ones, that the Debian maintainer can use for that distro.
Ideally, I would like to say that only one file is compiled
with the -mssse3
option. Since my CPU-specific selector code
isn't part of this file, no SSSE3 code will ever be executed
unless the CPU supports it.
However, I can't figure out any way to tell distutils
that
a set of compiler options are specific to a single file.
Is that even possible?
Upvotes: 14
Views: 5128
Reputation: 3010
Unfortunately the OP's solution will work only for Unix compilers. Here is a cross-compiler one:
(MSVC doesn't support an automatic SSSE3 code generation, so I'll use an AVX for example)
from setuptools import setup, Extension
import distutils.ccompiler
filename = 'example_avx'
compiler_options = {
'unix': ('-mavx',),
'msvc': ('/arch:AVX',)
}
def spawn(self, cmd, **kwargs):
extra_options = compiler_options.get(self.compiler_type)
if extra_options is not None:
# filenames are closer to the end of command line
for argument in reversed(cmd):
# Check if argument contains a filename. We must check for all
# possible extensions; checking for target extension is faster.
if not argument.endswith(self.obj_extension):
continue
# check for a filename only to avoid building a new string
# with variable extension
off_end = -len(self.obj_extension)
off_start = -len(filename) + off_end
if argument.endswith(filename, off_start, off_end):
if self.compiler_type == 'bcpp':
# Borland accepts a source file name at the end,
# insert the options before it
cmd[-1:-1] = extra_options
else:
cmd += extra_options
# we're done, restore the original method
self.spawn = self.__spawn
# filename is found, no need to search any further
break
distutils.ccompiler.spawn(cmd, dry_run=self.dry_run, **kwargs)
distutils.ccompiler.CCompiler.__spawn = distutils.ccompiler.CCompiler.spawn
distutils.ccompiler.CCompiler.spawn = spawn
setup(
...
ext_modules = [
Extension('extension_name', ['example.c', 'example_avx.c'])
],
...
)
See my answer here for a cross-compiler way to specify compiler/linker options in general.
Upvotes: 2
Reputation: 15295
It's been 5 years but I figured out a solution which I like better than my "CC" wrapper.
The "build_ext" command creates a self.compiler instance. The compiler.compile() method takes the list of all source files to compile. The base class does some setup, then has a compiler._compile() hook for a concrete compiler subclass to implement the actual per-file compilation step.
I felt that this was stable enough that I could intercept the code at that point.
I derived a new command from distutils.command.build_ext.build_ext which tweaks self.compiler._compile to wrap the bound class method with a one-off function attached to the instance:
class build_ext_subclass(build_ext):
def build_extensions(self):
original__compile = self.compiler._compile
def new__compile(obj, src, ext, cc_args, extra_postargs, pp_opts):
if src != "src/popcount_SSSE3.c":
extra_postargs = [s for s in extra_postargs if s != "-mssse3"]
return original__compile(obj, src, ext, cc_args, extra_postargs, pp_opts)
self.compiler._compile = new__compile
try:
build_ext.build_extensions(self)
finally:
del self.compiler._compile
I then told setup() to use this command-class:
setup(
...
cmdclass = {"build_ext": build_ext_subclass}
)
Upvotes: 4
Reputation: 20341
A very ugly solution would be to create two (or more Extension
) classes, one to hold the SSSE3 code and the other for everything else. You could then tidy the interface up in the python layer.
c_src = [f for f in my_files if f != 'ssse3_file.c']
c_gen = Extension('c_general', sources=c_src,
libraries=[], extra_compile_args=['-O3'])
c_ssse3 = Extension('c_ssse_three', sources=['ssse3_file.c'],
libraries=[], extra_compile_args=['-O3', '-mssse3'])
and in an __init__.py
somewhere
from c_general import *
from c_ssse_three import *
Of course you don't need me to write out that code! And I know this isn't DRY, I look forward to reading a better answer!
Upvotes: 6