Reputation: 1194
The boxplot would not plot as expected.
This is what it actually plotted:
This is what it is supposed to plot:
This is the code and data:
from sklearn.ensemble import RandomForestClassifier
from sklearn.cross_validation import cross_val_score
scores = []
for ne in range(1,41): ## ne is the number of trees
clf = RandomForestClassifier(n_estimators = ne)
score_list = cross_val_score(clf, X, Y, cv=10)
scores.append(score_list)
sns.boxplot(scores) # scores are list of arrays
plt.xlabel('Number of trees')
plt.ylabel('Classification score')
plt.title('Classification score as a function of the number of trees')
plt.show()
scores =
[array([ 0.8757764 , 0.86335404, 0.75625 , 0.85 , 0.86875 ,
0.81875 , 0.79375 , 0.79245283, 0.8490566 , 0.85534591]),
array([ 0.89440994, 0.8447205 , 0.79375 , 0.85 , 0.8625 ,
0.85625 , 0.86875 , 0.88050314, 0.86792453, 0.8427673 ]),
array([ 0.91304348, 0.9068323 , 0.83125 , 0.84375 , 0.8875 ,
0.875 , 0.825 , 0.83647799, 0.83647799, 0.87421384]),
array([ 0.86956522, 0.86956522, 0.85 , 0.875 , 0.88125 ,
0.86875 , 0.8625 , 0.8490566 , 0.86792453, 0.89308176]),
....]
Upvotes: 2
Views: 591
Reputation: 210852
I would first create pandas DF out of scores
:
import pandas as pd
In [15]: scores
Out[15]:
[array([ 0.8757764 , 0.86335404, 0.75625 , 0.85 , 0.86875 , 0.81875 , 0.79375 , 0.79245283, 0.8490566 , 0.85534591]),
array([ 0.89440994, 0.8447205 , 0.79375 , 0.85 , 0.8625 , 0.85625 , 0.86875 , 0.88050314, 0.86792453, 0.8427673 ]),
array([ 0.91304348, 0.9068323 , 0.83125 , 0.84375 , 0.8875 , 0.875 , 0.825 , 0.83647799, 0.83647799, 0.87421384]),
array([ 0.86956522, 0.86956522, 0.85 , 0.875 , 0.88125 , 0.86875 , 0.8625 , 0.8490566 , 0.86792453, 0.89308176])]
In [16]: df = pd.DataFrame(scores)
In [17]: df
Out[17]:
0 1 2 3 4 5 6 7 8 9
0 0.875776 0.863354 0.75625 0.85000 0.86875 0.81875 0.79375 0.792453 0.849057 0.855346
1 0.894410 0.844720 0.79375 0.85000 0.86250 0.85625 0.86875 0.880503 0.867925 0.842767
2 0.913043 0.906832 0.83125 0.84375 0.88750 0.87500 0.82500 0.836478 0.836478 0.874214
3 0.869565 0.869565 0.85000 0.87500 0.88125 0.86875 0.86250 0.849057 0.867925 0.893082
now we can easily plot boxplots:
In [18]: sns.boxplot(data=df)
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0xd121128>
Upvotes: 2