Reputation: 221
I wanted to create a decision tree and then prune it in python. However, sklearn does not support pruning by itself. With an internet search, I found this: https://github.com/sgenoud/scikit-learn/blob/4a75a4aaebd45e864e28cfca897121d1199e41d9/sklearn/tree/tree.py
But I don't know how to use the file. I tried:
from sklearn.datasets import load_iris
import tree
clf = tree.DecisionTreeClassifier()
iris = load_iris()
clf = clf.fit(iris.data, iris.target)
But I get the error ValueError: Attempted relative import in non-package. Is that not how I import? Do I need to save the files in a different way? Thank you.
Upvotes: 0
Views: 8125
Reputation: 996
Scikit-learn version 0.22 introduced pruning in DecisionTreeClassifier. A new hyperparameter called ccp_alpha
lets you calibrate the amount of pruning. See the documentation here.
Upvotes: 0
Reputation: 33938
If you really want to use sgenoud's 7-year-old fork of scikit-learn from back in 2012, git clone
on the base directory of the repo, don't just try to copy/clone individual files (of course you'll be losing any improvements/fixes since 2012; way back on v 0.12)
But that idea sounds misconceived: you can get shallower/pruned trees by changing parameters to get early stopping DecisionTreeClassifier
parameters max_depth, min_samples, min_samples_leaf, min_impurity_decrease, min_impurity_split. See the doc and play around with the parameters, they do what you're asking for. I've done ML for >10 years and never once seen a need to hack the DT source. There are tons of good reasons not to do this and no good reasons to.
(And if you try to play with the DecisionTreeClassifier parameters and still can't get what you want, post a reproducible code example here using an open-source dataset like iris etc.)
Upvotes: 1
Reputation: 4333
In Python, Modules (=Packages in other languages) oftentimes define routines that are interdependent. In these cases, you cannot only download one .py file and put it into your Workspace (i.e. the directory where your sources are located). Instead, download the entire package into that folder, and import relatively, i.e. like this:
# a general import, should only be used if you are absolutely certain that there will be no namespace conflicts
from sklearn.tree.tree import *
# a more "safe" way is to import the classes/functions you need explicitely
from sklearn.tree.tree import DecisionTreeClassifier
Upvotes: -1