Reputation: 2988
I have two numpy arrays as follows
clf_scores = numpy.array(
[[ 0.66333666, 0.65634366, 0.63836164, 0.64435564, 0.658 ,
0.641 , 0.67167167, 0.66066066, 0.67167167, 0.65165165],
[ 0.6983017 , 0.70629371, 0.70529471, 0.68331668, 0.702 ,
0.688 , 0.71371371, 0.69269269, 0.70770771, 0.6996997 ],
[ 0.65934066, 0.68531469, 0.65834166, 0.66333666, 0.677 ,
0.668 , 0.68568569, 0.68668669, 0.6996997 , 0.68168168],
.... .... .... .... ....
[ 0.68731269, 0.71928072, 0.7002997 , 0.70929071, 0.723 ,
0.697 , 0.68968969, 0.71271271, 0.72672673, 0.6996997 ],
[ 0.68731269, 0.72027972, 0.6973027 , 0.70729271, 0.726 ,
0.695 , 0.68568569, 0.71271271, 0.72572573, 0.6996997 ],
[ 0.69030969, 0.71728272, 0.6983017 , 0.70929071, 0.725 ,
0.698 , 0.68668669, 0.71371371, 0.72572573, 0.6996997 ]])
and
Trees = numpy.array(
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65,
66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91,
92, 93, 94, 95, 96, 97, 98, 99, 100])
These arrays have shapes (100,10) and (100,)
How do I Plot these two arrays using seaborn.boxplot ?
I tried to boxplot these two numpy arrays as follows
sns.boxplot(clf_scores,Trees)
however I got following error
NotImplementedError: > 1 ndim Categorical are not supported at this time
Please tell how do I correct it to get appropriate boxplot ?
PS: data set is obtained by finding cross_val_score
of RandomForestClassifier
with nTrees = 100
Correct output is somewhat as follows
Upvotes: 0
Views: 939
Reputation: 465
The easiest to me is to convert your data to a pandas dataframe first and then plot it using seaborn:
import numpy as np
import pandas as pd
import seaborn as sns
df = pd.DataFrame(np.transpose(clf_scores))
sns.boxplot(data=df)
The dataframe df
corresponds to the 'wide-form Dataframe' as described in the boxplot documentation. In your approach, seaborn got the format of the data wrong and assumed it to be a categorical, which is not the case.
Upvotes: 2