ben_aaron
ben_aaron

Reputation: 1522

Make console-friendly string a useable pandas dataframe python

A quick question as I'm currently changing from R to pandas for some projects:

I get the following print output from metrics.classification_report from sci-kit learn:

                   precision    recall    f1-score   support

      0            0.67      0.67       0.67         3
      1            0.50      1.00       0.67         1
      2            1.00      0.80       0.89         5

 avg / total       0.83      0.78       0.79         9

I want to use this (and similar ones) as a matrix/dataframe so, that I could subset it to extract, say the precision of class 0.

In R, I'd give the first "column" a name like 'outcome_class' and then subset it: my_dataframe[my_dataframe$class_outcome == 1, 'precision']

And I can do this in pandas but the dataframe that I want to use is simply a string see sckikit's doc

How can I make the table output here to a useable dataframe in pandas?

Upvotes: 0

Views: 92

Answers (1)

user2285236
user2285236

Reputation:

Assign it to a variable, s:

s = classification_report(y_true, y_pred, target_names=target_names)

Or directly:

s = '''
             precision    recall  f1-score   support

    class 0       0.50      1.00      0.67         1
    class 1       0.00      0.00      0.00         1
    class 2       1.00      0.67      0.80         3

avg / total       0.70      0.60      0.61         5
'''

Use that as the string input for StringIO:

import io  # For Python 2.x use import StringIO
df = pd.read_table(io.StringIO(s), sep='\s{2,}')  # For Python 2.x use StringIO.StringIO(s)
df
Out: 
             precision  recall  f1-score  support
class 0            0.5    1.00      0.67        1
class 1            0.0    0.00      0.00        1
class 2            1.0    0.67      0.80        3
avg / total        0.7    0.60      0.61        5

Now you can slice it like an R data.frame:

df.loc['class 2']['f1-score']
Out: 0.80000000000000004

Here, classes are the index of the DataFrame. You can use reset_index() if you want to use it as a regular column:

df = df.reset_index().rename(columns={'index': 'outcome_class'})
df.loc[df['outcome_class']=='class 1', 'support']
Out: 
1    1
Name: support, dtype: int64

Upvotes: 2

Related Questions