Reputation: 1522
A quick question as I'm currently changing from R to pandas for some projects:
I get the following print output from metrics.classification_report
from sci-kit learn
:
precision recall f1-score support
0 0.67 0.67 0.67 3
1 0.50 1.00 0.67 1
2 1.00 0.80 0.89 5
avg / total 0.83 0.78 0.79 9
I want to use this (and similar ones) as a matrix/dataframe so, that I could subset it to extract, say the precision of class 0.
In R, I'd give the first "column" a name like 'outcome_class' and then subset it:
my_dataframe[my_dataframe$class_outcome == 1, 'precision']
And I can do this in pandas but the dataframe
that I want to use is simply a string see sckikit's doc
How can I make the table output here to a useable dataframe in pandas?
Upvotes: 0
Views: 92
Reputation:
Assign it to a variable, s
:
s = classification_report(y_true, y_pred, target_names=target_names)
Or directly:
s = '''
precision recall f1-score support
class 0 0.50 1.00 0.67 1
class 1 0.00 0.00 0.00 1
class 2 1.00 0.67 0.80 3
avg / total 0.70 0.60 0.61 5
'''
Use that as the string input for StringIO:
import io # For Python 2.x use import StringIO
df = pd.read_table(io.StringIO(s), sep='\s{2,}') # For Python 2.x use StringIO.StringIO(s)
df
Out:
precision recall f1-score support
class 0 0.5 1.00 0.67 1
class 1 0.0 0.00 0.00 1
class 2 1.0 0.67 0.80 3
avg / total 0.7 0.60 0.61 5
Now you can slice it like an R data.frame:
df.loc['class 2']['f1-score']
Out: 0.80000000000000004
Here, classes are the index of the DataFrame. You can use reset_index()
if you want to use it as a regular column:
df = df.reset_index().rename(columns={'index': 'outcome_class'})
df.loc[df['outcome_class']=='class 1', 'support']
Out:
1 1
Name: support, dtype: int64
Upvotes: 2