Reputation: 466
I have a file containing a table (tbl extension). Its contents look like this:
Gibberish Gibberish Gibberish
{Group}
Name = 'Messi'
Height = 170 cm
Weight = 72 kg
{End Group}
{Group}
Name = 'Ronaldo'
Height = 187 cm
Weight = 84 kg
{End Group}
How can I read this to a pandas dataframe? I want to merge this with another file. I would like the output to be similar to this:
height weight
messi 170 72
ronaldo 187 84
I looked into pandas read_table
but to no avail.
Any help is appreciated.
Upvotes: 0
Views: 2673
Reputation: 4417
I wrote a function to generalise
import pandas as pd
import re
def read_custom_table(filename,
rec_st_lim='{',
rec_end_lim='}',
k_v_sep=':',
item_sep=',',
index_col=None):
"""
This function takes a text file name as input,
read the text and extracts records
and returns a pandas dataframe
Inputs
---------------
filename: string containing system file name
rec_st_lim: string of varied length(1+) marking the start of
a single record
rec_end_lim: string of varied length(1+) marking the end of
a single record
k_v_sep: key-value seperator within a an arbitray record.
item_sep: item seperator, seperates key/value pairs
index_col: the name of the column to use as index, default =None
i.e. index is a numerical range
----------------
Output: df, a dataframe with columns = the keys in an arbitrary
record and index = index_col when index_col is not None
"""
pattern = r"{}(.*?){}".format(rec_st_lim, rec_end_lim)
with open(filename) as f:
df = pd.DataFrame(
list
(map
(lambda rec:
dict([(el.strip() for el in r.split(k_v_sep))
for r in rec.split(item_sep) if len(r) > 1]),
re.findall(pattern, f.read(), re.DOTALL)
)
)
)
f.close()
if index_col:
df.set_index(index_col, inplace=True)
return df
the function can be used on the data in OP example as follows
df = read_custom_table('debug.txt',
rec_st_lim='\{Group\}',
rec_end_lim='\{End Group\}',
k_v_sep='=',
item_sep='\n',
index_col='Name')
print(df)
The output will
Height Weight
Name
'Messi' 170 cm 72 kg
'Ronaldo' 187 cm 84 kg
Upvotes: 2
Reputation: 82765
One way to accomplish what you are doing is to do a string manuplation and convert the data to a list of dictionary and then convert it to a dataframe.
Example:
import pandas as pd
stringVal = ''
with open("Path to inputfile", "r") as infile: #I have the data that you have posted in you question as content in input file
for i in infile.readlines():
if i.startswith("Name"):
stringVal += (i+"|").replace("\n", "").replace("'", "")
if i.startswith("Height"):
stringVal += (i+"|").replace("\n", "")
if i.startswith("Weight"):
stringVal += i+"\n"
res = []
for i in stringVal.strip().split("\n"):
if i:
d = {}
for j in i.split("|"):
val = j.split("=")
d[val[0].strip()] = val[1].strip()
res.append(d)
df = pd.DataFrame(res)
df = df.set_index('Name')
print df
Output:
Height Weight
Name
Messi 170 cm 72 kg
Ronaldo 187 cm 84 kg
Upvotes: 0