Reputation: 696
I have 2 text files that contain contents like below :
/*foo1.txt*/
Number of data records: 1000
Number of attributes: 231
Class attribute index: 231
Monotonic Transformation: None
Number of class labels: 10
Number of folds: 10
Test fold: 1
Random seed: 0
(Dis)similarity measure: Test_SVM
Task: SVMi
Number of bins (b): 10
Histogram type: EF
Number of trees (T): 0 (For tree-based methods.)
Sample size (W): 0 (For tree-based methods.)
Running Experiment... Please wait...
#Atts. considered as irrelevant: 0
Data size: 900; Query size: 100
Dimensionality of the space: 230
... using Test SVM for SVM ...
... Equal Frequency discretisation (b=10) ...
Max. num. of bins: 10, Min. num. of bins: 10
SVM Classification accuarcy scores (C=0.1): 0.5300
SVM Classification accuarcy scores (C=0.5): 0.6300
SVM Classification accuarcy scores (C=10): 0.7300
SVM Classification accuarcy scores (C=100): 0.7300
Done!
Total runtime: 6.8169 second.
/*foo2.txt*/
Number of data records: 1000
Number of attributes: 231
Class attribute index: 231
Monotonic Transformation: None
Number of class labels: 10
Number of folds: 10
Test fold: 1
Random seed: 0
(Dis)similarity measure: Test_SVM
Task: SVM
Number of bins (b): 30
Histogram type: EF
Number of trees (T): 0 (For tree-based methods.)
Sample size (W): 0 (For tree-based methods.)
Running Experiment... Please wait...
#Atts. considered as irrelevant: 0
Data size: 900; Query size: 100
Dimensionality of the space: 230
... using Test SVM for SVM ...
... Equal Frequency discretisation (b=30) ...
Max. num. of bins: 30, Min. num. of bins: 30
SVM Classification accuarcy scores (C=0.1): 0.6600
SVM Classification accuarcy scores (C=0.5): 0.7400
SVM Classification accuarcy scores (C=10): 0.8000
SVM Classification accuarcy scores (C=100): 0.8000
Done!
Total runtime: 8.2947 second.
The goal is to fetch the contents of two text files (.txt files foo1 and foo2) into a pandas dataframe df
that should look like below.
How can I fetch the values like in the mentioned above dataframe ?
EDIT - As the structure of the text in the actual txt files was different, hence editing the question to reflect the data in actual text files.
Upvotes: 1
Views: 434
Reputation: 71689
Update (based on the text files you shared in the comments and as per the discussion)
Using a regular expression pattern extract the relevant sections from the text contents of the file, then using another regex pattern find all col-value value pairs and map these pairs to the dictionary in order to create records. Note: I assumed data
as the folder which contains the text files, you can replace it with your actual folder.
import re
from pathlib import Path
def read_files():
for file in Path('data').glob('*.txt'):
data = file.open().read()
m = re.search(r'(.*?)Running Exp.*?(?=SVM Class)(.*?)Done!', data, re.DOTALL)
c = re.findall(r'^(.*?)\s*:\s*(.*?)\s*(?:\(|$)', m.group(1), re.MULTILINE)
yield {**dict(c), 'Results': m.group(2).strip()}
df = pd.DataFrame(read_files())
Number of data records Number of attributes Class attribute index Monotonic Transformation Number of class labels Number of folds Test fold Random seed Task Number of bins (b) Histogram type Number of trees (T) Sample size (W) Results
0 1000 231 231 None 10 10 1 0 SVM 30 EF 0 0 SVM Classification accuarcy scores (C=0.1): 0.6600\nSVM Classification accuarcy scores (C=0.5): 0.7400\nSVM Classification accuarcy scores (C=10): 0.8000\nSVM Classification accuarcy scores (C=100): 0.8000
1 1000 231 231 None 10 10 1 0 SVMi 10 EF 0 0 SVM Classification accuarcy scores (C=0.1): 0.5300\nSVM Classification accuarcy scores (C=0.5): 0.6300\nSVM Classification accuarcy scores (C=10): 0.7300\nSVM Classification accuarcy scores (C=100): 0.7300
Upvotes: 1