DuFei
DuFei

Reputation: 457

How to read Scikit-Learn source code?

I am learning to use scikit-learn to build a decision tree. However, when I go with the example code. I found the kernel code of the tree building is empty.

I am using the following code:

from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

I go to fit() method to see the details of the code. And I think the most important code for implementing decision tree is the following code at line 362 of the tree.py.

 builder.build(self.tree_, X, y, sample_weight, X_idx_sorted)

However, when I go into the build method in _tree.py, I found that all method is empty which only contains 'pass' keyword, such as:

""" Build a decision tree in depth-first fashion. """
def build(self, *args, **kwargs): # real signature unknown
    """ Build a decision tree from the training set (X, y). """
    pass

I am wondering about the strange code. I have no idea to figure it out. Am I wrong about the source code? How could this code run?

I am using PyCharm as my IDE and using Anaconda3 as my environment....It was so strange

Upvotes: 1

Views: 3779

Answers (2)

大王白小甫
大王白小甫

Reputation: 1

the location of tree source code

sourcecode is written by cython. '.pyd' file is like a header file in c or c++, and '.pyx' file is like a .c or .cpp file in c or c++.

Upvotes: 0

Theon
Theon

Reputation: 11

Some of the libraries in sklearn are compiled with cython. And you can't find the source code in your folder. They are placed in your folder as a form of .pyd and it is impossible to read this. The .pyd files are only imported from the other .py files like library.

You can find the original source code in sklern git repository as a form of .pyx. (file name is same)

The cython syntax is a little different from python syntax, especially in defining variables. If you want to change the code, you should compile .pyx to .pyd.

Upvotes: 1

Related Questions