user9991
user9991

Reputation: 63

How to parse text in a table format shown to a csv file?

I want to convert a text file with data in table like format to csv. The table looks like this

name                     type             attribute
------------------ -------------------- -----------=
---------
oscar            dog              can bark and can
roll over
bruno            dog              can bark
buddy            dog              can bark and can
roll over

The csv should look like this

name,type,attribute
oscar,dog,can bark and can roll over
bruno,dog,can bark
buddy,dog,can bark and can roll over

How can I accomplish this?

Upvotes: 1

Views: 345

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195438

It seems that somebody saved the document with word-wrap option turned on.

You can try this script to convert the text to DataFrame (I'm supposing the columns are separated by more than two spaces):

import re
import pandas as pd


txt = '''name                type              attribute
----------------- ----------------------=
---------------- --------------------------=
oscar                  dog               can bark
and can roll over

bruno                  dog              can bark

buddy                 dog             can bark and can 
roll over'''

is_row = re.compile(r'\s{2,}')
is_dashed_line = re.compile(r'-{3,}')

data = []
for line in map(str.strip, txt.splitlines()):
    if not line or is_dashed_line.search(line):
        continue
    if is_row.search(line):
        data.append( is_row.split(line) )
    else:
        data[-1][-1] += ' ' + line

df = pd.DataFrame(data[1:], columns=data[0])
print(df)

Prints:

    name type                   attribute
0  oscar  dog  can bark and can roll over
1  bruno  dog                    can bark
2  buddy  dog  can bark and can roll over

Upvotes: 1

Related Questions