der_radler
der_radler

Reputation: 579

strsplit() output as a dataframe in r

I have some results from a model in Python which i have saved as a .txt to render in RMarkdown.

The .txt is this.

             precision    recall  f1-score   support

          0       0.71      0.83      0.77      1078
          1       0.76      0.61      0.67       931

avg / total       0.73      0.73      0.72      2009

I read the file into r as,

x <- read.table(file = 'report.txt', fill = T, sep = '\n')

When i save this, r saves the results as one column (V1) instead of 5 columns as below,

                                                    V1
1              precision    recall  f1-score   support
2           0       0.71      0.83      0.77      1078
3           1       0.76      0.61      0.67       931
4 avg / total       0.73      0.73      0.72      2009

I tried using strsplit() to split the columns, but doesn't work.

strsplit(as.character(x$V1), split = "|", fixed = T)

May be strsplit() is not the right approach? How do i get around this so that i have a [4x5] dataframe.

Thanks a lot.

Upvotes: 0

Views: 705

Answers (2)

der_radler
der_radler

Reputation: 579

Since much simpler to have python output csv, i am posting an alternative here. Just in case if it is useful as even in python needs some work.

def report_to_csv(report, title):
    report_data = []
    lines = report.split('\n')

    # loop through the lines
    for line in lines[2:-3]:
        row = {}
        row_data = line.split('      ')
        row['class'] = row_data[1]
        row['precision'] = float(row_data[2])
        row['recall'] = float(row_data[3])
        row['f1_score'] = float(row_data[4])
        row['support'] = float(row_data[5])
        report_data.append(row)

    df = pd.DataFrame.from_dict(report_data)

    # read the final summary line
    line_data = lines[-2].split('     ')
    summary_dat = []
    row2 = {}
    row2['class'] = line_data[0]
    row2['precision'] = float(line_data[1])
    row2['recall'] = float(line_data[2])
    row2['f1_score'] = float(line_data[3])
    row2['support'] = float(line_data[4])
    summary_dat.append(row2)

    summary_df = pd.DataFrame.from_dict(summary_dat)

    # concatenate both df. 
    report_final = pd.concat([df,summary_df], axis=0)
    report_final.to_csv(title+'cm_report.csv', index = False)

Function inspired from this solution

Upvotes: 0

AndS.
AndS.

Reputation: 8110

Not very elegant, but this works. First we read the raw text, then we use regex to clean up, delete white space, and convert to csv readable format. Then we read the csv.

library(stringr)
library(magrittr)
library(purrr)

text <- str_replace_all(readLines("~/Desktop/test.txt"), "\\s(?=/)|(?<=/)\\s", "") %>% 
  .[which(nchar(.)>0)] %>% 
  str_split(pattern = "\\s+") %>% 
  map(., ~paste(.x, collapse = ",")) %>% 
  unlist

read.csv(textConnection(text))
#>           precision recall f1.score support
#> 0              0.71   0.83     0.77    1078
#> 1              0.76   0.61     0.67     931
#> avg/total      0.73   0.73     0.72    2009

Created on 2018-09-20 by the reprex package (v0.2.0).

Upvotes: 1

Related Questions