Reputation: 13
I am trying to time each round of an adaboost algorithm (how long each additional tree takes to build). I conda installed scikit-learn 1.4.0 (because on their website it says this version) and all the other reqs to run code.
Here is my code:
Y, z = parse.getHARData() #returns my features Y and labels z
Z_train, Z_test, j_train, j_test = train_test_split(Y, z, test_size=0.30, shuffle=True)
b_estimator = DecisionTreeClassifier(max_depth=DEPTH)
ada = AdaBoostClassifier(estimator=b_estimator, n_estimators=NUMTREES)
elapsed_times = []
for stage in range(NUMTREES): start_time = time.time()
# Access and fit the current base estimator
base_estimator = ada._make_estimator(append=True, random_state=42)
base_estimator.fit(Z_train, j_train)
elapsed_time = time.time() - start_time
elapsed_times.append(elapsed_time)
I was expecting this to start the timer, grow a tree using the forest's previous information, add that tree to the ensemble, stop the time, and append the elapsed time to elapsed_times.
instead, its returning this error:
AttributeError Traceback (most recent call last)
Cell In[9], line 7
4 start_time = time.time()
6 # Access and fit the current base estimator
----> 7 base_estimatr = ada._make_estimator(append=True, random_state=42)
8 base_estimatr.fit(Z_train, j_train)
10 elapsed_time = time.time() - start_time
File ~/anaconda3/envs/ADA/lib/python3.11/site-packages/sklearn/ensemble/_base.py:141, in BaseEnsemble._make_estimator(self, append, random_state)
135 def _make_estimator(self, append=True, random_state=None):
136 """Make and configure a copy of the `estimator_` attribute.
137
138 Warning: This method should be used to properly instantiate new
139 sub-estimators.
140 """
--> 141 estimator = clone(self.estimator_)
142 estimator.set_params(**{p: getattr(self, p) for p in self.estimator_params})
144 if random_state is not None:
AttributeError: 'AdaBoostClassifier' object has no attribute 'estimator_'
Upvotes: 0
Views: 227
Reputation: 12738
The attribute estimator_
gets set early in the fit
method:
So you can just manually call ada._validate_estimator()
before your loop to initialize it, and this particular error should be fixed.
But you're skipping a lot in your loop, most notably the weight updates. You should at least call _boost
, and check the source to see if there's anything else to include in the loop. (It's unfortunate there's no partial_fit
for adaboost...). Or, consider just modifying your local install of sklearn to include the timing you want to measure.
Upvotes: 1