Ivan Lee
Ivan Lee

Reputation: 4261

How to Create a table with data from array output in Python

I printed out composed array and saved to text file, it like:

({
    ngram_a67e6f3205f0-n: 1,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8580469779197205)
({
    ngram_a67e6f3205f0-n: 2,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8880895806519427)
({
    ngram_a67e6f3205f0-n: 3,
    logreg_c120232d9faa-regParam: 0.01,
    cntVec_9c0e7831261d-vocabSize: 10000
},0.8656452460818544)

I hope extract data to produce python Dataframe, it like:

1, 10000, 0.8580469779197205
2, 10000, 0.8880895806519427

Upvotes: 2

Views: 166

Answers (2)

Massifox
Massifox

Reputation: 4487

My advice is to change the input format of your file, if possible. It would greatly simplify your life.
If this is not possible, the following code solves your problem:

import pandas as pd
import re

pattern_tuples = '(?<=\()[^\)]*'
pattern_numbers = '[ ,](?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?'
col_name = ['ngram', 'logreg', 'vocabSize', 'score']

with open('test.txt','r') as f:
    matchs = re.findall(pattern_tuples, f.read())
    arr_data = [[float(val.replace(',','')) for val in re.findall(pattern_numbers, match)] for match in matchs]
    df = pd.DataFrame(arr_data, columns=col_name).astype({'ngram':'int', 'vocabSize': 'int'})

and gives:

   ngram  logreg  vocabSize     score
0      1    0.01      10000  0.858047
1      2    0.01      10000  0.888090
2      3    0.01      10000  0.865645

Brief explanation

  1. Read the file
  2. Using re.findall and the regex pattern_tuples finds all the tuples in the file

  3. For each tuple, using the regex pattern_numbers you will find the 4 numerical values ​​that interest you. In this way you will get a list of lists containing your data

  4. Enter the results in a pandas dataframe


Extra

Here's how you could save your CV results in json format, so you can manage them more easily:

  1. Create an cv_results array to keep the CV results

  2. For each loop of CVs you will get a tuple t with the results, which you will have to transform into a dictionary and hang in the array cv_results

  3. At the end of the CV loops, save the results in json format

.

cv_results = []

for _ in range_cv: # Loop CV
    # ... Calculate results of CV in t
    t = ({'ngram_a67e6f3205f0-n': 1,
       'logreg_c120232d9faa-regParam': 0.01,
       'cntVec_9c0e7831261d-vocabSize': 10000},
      0.8580469779197205) # FAKE DATA for this example

    # append results like a dict
    cv_results.append({'res':t[0], 'score':t[1]})

# Store results in json format
with open('cv_results.json', 'w') as outfile:
    json.dump(cv_results, outfile, indent=4)

Now you can read the json file and you can access all the fields like a normal python dictionary:

with open('cv_results.json') as json_file:
    data = json.load(json_file)

data[0]['score']
# output: 0.8580469779197205

Upvotes: 3

Bugbeeb
Bugbeeb

Reputation: 2161

Why not do:

import pandas as pd
With open(file.txt) as file:
    df = pd.DataFrame([i for i in eval(file.readline())])

Eval takes a string and converts it to the literal python representation which is pretty nifty. That would convert each parenthetical to a single item iterator which is then stored into a list. Pd dataframe class can take a list of dictionaries with identical keys and create a dataframe

Upvotes: 0

Related Questions