Reputation: 4261
I printed out composed array and saved to text file, it like:
({
ngram_a67e6f3205f0-n: 1,
logreg_c120232d9faa-regParam: 0.01,
cntVec_9c0e7831261d-vocabSize: 10000
},0.8580469779197205)
({
ngram_a67e6f3205f0-n: 2,
logreg_c120232d9faa-regParam: 0.01,
cntVec_9c0e7831261d-vocabSize: 10000
},0.8880895806519427)
({
ngram_a67e6f3205f0-n: 3,
logreg_c120232d9faa-regParam: 0.01,
cntVec_9c0e7831261d-vocabSize: 10000
},0.8656452460818544)
I hope extract data to produce python Dataframe, it like:
1, 10000, 0.8580469779197205
2, 10000, 0.8880895806519427
Upvotes: 2
Views: 166
Reputation: 4487
My advice is to change the input format of your file, if possible. It would greatly simplify your life.
If this is not possible, the following code solves your problem:
import pandas as pd
import re
pattern_tuples = '(?<=\()[^\)]*'
pattern_numbers = '[ ,](?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?'
col_name = ['ngram', 'logreg', 'vocabSize', 'score']
with open('test.txt','r') as f:
matchs = re.findall(pattern_tuples, f.read())
arr_data = [[float(val.replace(',','')) for val in re.findall(pattern_numbers, match)] for match in matchs]
df = pd.DataFrame(arr_data, columns=col_name).astype({'ngram':'int', 'vocabSize': 'int'})
and gives:
ngram logreg vocabSize score
0 1 0.01 10000 0.858047
1 2 0.01 10000 0.888090
2 3 0.01 10000 0.865645
Using re.findall and the regex pattern_tuples
finds all the tuples in the file
For each tuple, using the regex pattern_numbers
you will find the 4 numerical values that interest you. In this way you will get a list of lists containing your data
Enter the results in a pandas dataframe
Here's how you could save your CV results in json format, so you can manage them more easily:
Create an cv_results
array to keep the CV results
For each loop of CVs you will get a tuple t
with the results, which you will have to transform into a dictionary and hang in the array cv_results
At the end of the CV loops, save the results in json format
.
cv_results = []
for _ in range_cv: # Loop CV
# ... Calculate results of CV in t
t = ({'ngram_a67e6f3205f0-n': 1,
'logreg_c120232d9faa-regParam': 0.01,
'cntVec_9c0e7831261d-vocabSize': 10000},
0.8580469779197205) # FAKE DATA for this example
# append results like a dict
cv_results.append({'res':t[0], 'score':t[1]})
# Store results in json format
with open('cv_results.json', 'w') as outfile:
json.dump(cv_results, outfile, indent=4)
Now you can read the json file and you can access all the fields like a normal python dictionary:
with open('cv_results.json') as json_file:
data = json.load(json_file)
data[0]['score']
# output: 0.8580469779197205
Upvotes: 3
Reputation: 2161
Why not do:
import pandas as pd
With open(file.txt) as file:
df = pd.DataFrame([i for i in eval(file.readline())])
Eval takes a string and converts it to the literal python representation which is pretty nifty. That would convert each parenthetical to a single item iterator which is then stored into a list. Pd dataframe class can take a list of dictionaries with identical keys and create a dataframe
Upvotes: 0