Reputation: 63
I want to convert a text file with data in table like format to csv. The table looks like this
name type attribute
------------------ -------------------- -----------=
---------
oscar dog can bark and can
roll over
bruno dog can bark
buddy dog can bark and can
roll over
The csv should look like this
name,type,attribute
oscar,dog,can bark and can roll over
bruno,dog,can bark
buddy,dog,can bark and can roll over
How can I accomplish this?
Upvotes: 1
Views: 345
Reputation: 195438
It seems that somebody saved the document with word-wrap
option turned on.
You can try this script to convert the text to DataFrame (I'm supposing the columns are separated by more than two spaces):
import re
import pandas as pd
txt = '''name type attribute
----------------- ----------------------=
---------------- --------------------------=
oscar dog can bark
and can roll over
bruno dog can bark
buddy dog can bark and can
roll over'''
is_row = re.compile(r'\s{2,}')
is_dashed_line = re.compile(r'-{3,}')
data = []
for line in map(str.strip, txt.splitlines()):
if not line or is_dashed_line.search(line):
continue
if is_row.search(line):
data.append( is_row.split(line) )
else:
data[-1][-1] += ' ' + line
df = pd.DataFrame(data[1:], columns=data[0])
print(df)
Prints:
name type attribute
0 oscar dog can bark and can roll over
1 bruno dog can bark
2 buddy dog can bark and can roll over
Upvotes: 1