Regex text to pandas dataframe

Question

I have a text file that contains multiple lines in the format given below:

real    0m0.020s
user    0m0.000s
sys 0m0.000s
Round  1  completed. with matrix size of  1200 x 1200 with threads 8

real    0m0.022s
user    0m0.000s
sys 0m0.001s
Round  2  completed. with matrix size of  1200 x 1200 with threads 8

There are about 500 entries of the this sort(above is an example of 2). I can't seem to figure out how to get them into a pandas dataframe that might look something like this:

Matrix Size    Threads    Round    Real    User    Sys
1200 x 1200    8          1        0.0020  0.0000  0.0000
1200 x 1200    8          2        0.0022  0.0000  0.0001

Is there a way using regex or some other way to convert the test output into a dataframe. Additionally I don't know if I interpreted the times correctly either as they are in 0m(I think 0 minutes) and the 0.02 (I think 0.02 seconds)

gmds · Accepted Answer

You can use a regex:

import re
import pandas as pd

regex = re.compile(r'real +(\dm\d\.\d+s)
user +(\dm\d\.\d+s)
sys +(\dm\d\.\d+s)
Round +(\d+).+of +(\d+ x \d+).+threads (\d+)')

df = pd.DataFrame(regex.findall(data), columns=['real', 'user', 'sys', 'round', 'matrix size', 'threads'])

print(df)

Output:

       real      user       sys round  matrix size threads
0  0m0.020s  0m0.000s  0m0.000s     1  1200 x 1200       8
1  0m0.022s  0m0.000s  0m0.001s     2  1200 x 1200       8

Regex text to pandas dataframe

Answers (2)

Related Questions