How can I make a PyTorch extension with cmake

Question

This tutorial demonstrates how to make a C++/CUDA-based Python extension for PyTorch. But for ... reasons ... my use-case is more complicated than this and doesn't fit neatly within the Python setuptools framework described by the tutorial.

Is there a way to use cmake to compile a Python library that extends PyTorch?

Richard · Accepted Answer

Yes.

The trick is to use cmake to combine together all the C++ and CUDA files we'll need and to use PyBind11 to build the interface we want; fortunately, PyBind11 is included with PyTorch.

The code below is collected and kept up-to-date in this Github repo.

Our project consists of several files:

CMakeLists.txt

cmake_minimum_required (VERSION 3.9)

project(pytorch_cmake_example LANGUAGES CXX CUDA)

find_package(Python REQUIRED COMPONENTS Development)
find_package(Torch REQUIRED)

# Modify if you need a different default value
if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
  set(CMAKE_CUDA_ARCHITECTURES 61)
endif()

# List all your code files here
add_library(pytorch_cmake_example SHARED
  main.cu
)
target_compile_features(pytorch_cmake_example PRIVATE cxx_std_11)
target_link_libraries(pytorch_cmake_example PRIVATE ${TORCH_LIBRARIES} Python::Python)

# Use if the default GCC version gives issues.
# Similar syntax is used if we need better compilation flags.
target_compile_options(pytorch_cmake_example PRIVATE $<$:-ccbin g++-9>)

# Use a variant of this if you're on an earlier cmake than 3.18
# target_compile_options(pytorch_cmake_example PRIVATE $<$:-gencode arch=compute_61,code=sm_61>)

main.cu

#include 
#include 
#include 

using namespace at;


int64_t integer_round(int64_t num, int64_t denom){
  return (num + denom - 1) / denom;
}


template
__global__ void add_one_kernel(const T *const input, T *const output, const int64_t N){
  // Grid-strided loop
  for(int i=blockDim.x*blockIdx.x+threadIdx.x;i>>(
        input.data_ptr(),
        output.data_ptr(),
        input.numel()
      );
      // Always test your kernel launches
      C10_CUDA_KERNEL_LAUNCH_CHECK();
    }
  );

  return output;
}


///Note that we can have multiple implementations spread across multiple files, though there should only be one `def`
TORCH_LIBRARY(pytorch_cmake_example, m) {
  m.def("add_one(Tensor input) -> Tensor");
  m.impl("add_one", c10::DispatchKey::CUDA, TORCH_FN(add_one));
  //c10::DispatchKey::CPU is also an option
}

Compilation

Compile it all using this command:

cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` -GNinja ..

test.py

You can then run the following test script.

import torch
torch.ops.load_library("build/libpytorch_cmake_example.so")

shape = (3,3,3)
a = torch.randint(0, 10, shape, dtype=torch.float).cuda()
a_plus_one = torch.ops.pytorch_cmake_example.add_one(a)

How can I make a PyTorch extension with cmake

Answers (1)

CMakeLists.txt

main.cu

Compilation

test.py

Related Questions