How to Create a table with data from array output in Python

Question

I printed out composed array and saved to text file, it like:

({
    ngram_a67e6f3205f0-n: 1,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8580469779197205)
({
    ngram_a67e6f3205f0-n: 2,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8880895806519427)
({
    ngram_a67e6f3205f0-n: 3,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8656452460818544)

I hope extract data to produce python Dataframe, it like:

1, 10000, 0.8580469779197205
2, 10000, 0.8880895806519427

Massifox · Accepted Answer

My advice is to change the input format of your file, if possible. It would greatly simplify your life.
If this is not possible, the following code solves your problem:

import pandas as pd
import re

pattern_tuples = '(?<=$)[^$]*'
pattern_numbers = '[ ,](?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?'
col_name = ['ngram', 'logreg', 'vocabSize', 'score']

with open('test.txt','r') as f:
    matchs = re.findall(pattern_tuples, f.read())
    arr_data = [[float(val.replace(',','')) for val in re.findall(pattern_numbers, match)] for match in matchs]
    df = pd.DataFrame(arr_data, columns=col_name).astype({'ngram':'int', 'vocabSize': 'int'})

and gives:

   ngram  logreg  vocabSize     score
0      1    0.01      10000  0.858047
1      2    0.01      10000  0.888090
2      3    0.01      10000  0.865645

Brief explanation

Read the file
Using re.findall and the regex pattern_tuples finds all the tuples in the file
For each tuple, using the regex pattern_numbers you will find the 4 numerical values that interest you. In this way you will get a list of lists containing your data
Enter the results in a pandas dataframe

Extra

Here's how you could save your CV results in json format, so you can manage them more easily:

Create an cv_results array to keep the CV results
For each loop of CVs you will get a tuple t with the results, which you will have to transform into a dictionary and hang in the array cv_results
At the end of the CV loops, save the results in json format

.

cv_results = []

for _ in range_cv: # Loop CV
    # ... Calculate results of CV in t
    t = ({'ngram_a67e6f3205f0-n': 1,
       'logreg_c120232d9faa-regParam': 0.01,
       'cntVec_9c0e7831261d-vocabSize': 10000},
      0.8580469779197205) # FAKE DATA for this example

    # append results like a dict
    cv_results.append({'res':t[0], 'score':t[1]})

# Store results in json format
with open('cv_results.json', 'w') as outfile:
    json.dump(cv_results, outfile, indent=4)

Now you can read the json file and you can access all the fields like a normal python dictionary:

with open('cv_results.json') as json_file:
    data = json.load(json_file)

data[0]['score']
# output: 0.8580469779197205

How to Create a table with data from array output in Python

Answers (2)

Brief explanation

Extra

Related Questions