Ancalagon BerenLuthien
Ancalagon BerenLuthien

Reputation: 1194

boxplot (from seaborn) would not plot as expected

The boxplot would not plot as expected. This is what it actually plotted: enter image description here

This is what it is supposed to plot: enter image description here

This is the code and data:

 from sklearn.ensemble import RandomForestClassifier
    from sklearn.cross_validation import cross_val_score
    scores = []
    for ne in range(1,41): ## ne is the number of trees
        clf = RandomForestClassifier(n_estimators = ne)
        score_list = cross_val_score(clf, X, Y, cv=10)
        scores.append(score_list)
        sns.boxplot(scores) # scores are list of arrays
        plt.xlabel('Number of trees')
        plt.ylabel('Classification score')
        plt.title('Classification score as a function of the number of trees')
        plt.show()

scores =

[array([ 0.8757764 ,  0.86335404,  0.75625   ,  0.85      ,  0.86875   ,
         0.81875   ,  0.79375   ,  0.79245283,  0.8490566 ,  0.85534591]),
 array([ 0.89440994,  0.8447205 ,  0.79375   ,  0.85      ,  0.8625    ,
         0.85625   ,  0.86875   ,  0.88050314,  0.86792453,  0.8427673 ]),
 array([ 0.91304348,  0.9068323 ,  0.83125   ,  0.84375   ,  0.8875    ,
         0.875     ,  0.825     ,  0.83647799,  0.83647799,  0.87421384]),
 array([ 0.86956522,  0.86956522,  0.85      ,  0.875     ,  0.88125   ,
         0.86875   ,  0.8625    ,  0.8490566 ,  0.86792453,  0.89308176]),

....]

Upvotes: 2

Views: 591

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

I would first create pandas DF out of scores:

import pandas as pd

In [15]: scores
Out[15]:
[array([ 0.8757764 ,  0.86335404,  0.75625   ,  0.85      ,  0.86875   ,  0.81875   ,  0.79375   ,  0.79245283,  0.8490566 ,  0.85534591]),
 array([ 0.89440994,  0.8447205 ,  0.79375   ,  0.85      ,  0.8625    ,  0.85625   ,  0.86875   ,  0.88050314,  0.86792453,  0.8427673 ]),
 array([ 0.91304348,  0.9068323 ,  0.83125   ,  0.84375   ,  0.8875    ,  0.875     ,  0.825     ,  0.83647799,  0.83647799,  0.87421384]),
 array([ 0.86956522,  0.86956522,  0.85      ,  0.875     ,  0.88125   ,  0.86875   ,  0.8625    ,  0.8490566 ,  0.86792453,  0.89308176])]

In [16]: df = pd.DataFrame(scores)

In [17]: df
Out[17]:
          0         1        2        3        4        5        6         7         8         9
0  0.875776  0.863354  0.75625  0.85000  0.86875  0.81875  0.79375  0.792453  0.849057  0.855346
1  0.894410  0.844720  0.79375  0.85000  0.86250  0.85625  0.86875  0.880503  0.867925  0.842767
2  0.913043  0.906832  0.83125  0.84375  0.88750  0.87500  0.82500  0.836478  0.836478  0.874214
3  0.869565  0.869565  0.85000  0.87500  0.88125  0.86875  0.86250  0.849057  0.867925  0.893082

now we can easily plot boxplots:

In [18]: sns.boxplot(data=df)
Out[18]: <matplotlib.axes._subplots.AxesSubplot at 0xd121128>

enter image description here

Upvotes: 2

Related Questions