Reputation: 1795
I have a very simple dataset (30 rows, 32 columns).
I wrote a Python program to load the data and train an XGBoost model, then save the model to disk.
I also compiled a C++ program that uses libxgboost (C api) and loads the model for inference.
When using the SAME saved model, Python and C++ give different results for the same input (a single row of all zeros).
xgboost is 0.90 and I have attached all files (including the numpy data files) here:
https://www.dropbox.com/s/txao5ugq6mgssz8/xgboost_mismatch.tar?dl=0
Here are the outputs of the two programs (the source of which are in the .tar file):
(which prints a few strings while building the model and THEN prints the single number output)
$ python3 jl_functions_tiny.py
Loading data
Creating model
Training model
Saving model
Deleting model
Loading model
Testing model
[587558.2]
(which emits a single number that clearly doesn't match the single Python number output)
$ ./jl_functions
628180.062500
Upvotes: 6
Views: 1117
Reputation: 1302
different seed parameter in python and in C++ can cause different result's since there a usage in randomness sin the algorithm , try to set seed= in line 11 xgb.XGBregressor
same in python and in C++ or even via numpy using numpy.random.seed(0)
and in C++ the seed parameter from /workspace/include/xgboost/generic_parameters.h
Upvotes: 1
Reputation: 781
a) You are saving your model as model.save which has issues with feature vector ordering you could try it with model.dump xgboost load model in c++ (python -> c++ prediction scores mismatch)
b) Please check your python code that you are not using sparse matrix to create model-my intuition says problem is here
Disclaimer: I am not expert or any good in c++ but what I figured out this might be the reason for non matching predictions and I don't have any environment handy to test your C++ and share results.
Upvotes: 0