Ali
Ali

Reputation: 466

Read table into dataframe in pandas

I have a file containing a table (tbl extension). Its contents look like this:

Gibberish Gibberish Gibberish 
{Group}
Name = 'Messi'
Height = 170 cm
Weight = 72 kg
{End Group}
{Group}
Name = 'Ronaldo'
Height = 187 cm
Weight = 84 kg
{End Group}

How can I read this to a pandas dataframe? I want to merge this with another file. I would like the output to be similar to this:

      height   weight
messi   170      72
ronaldo 187      84

I looked into pandas read_table but to no avail.

Any help is appreciated.

Upvotes: 0

Views: 2673

Answers (2)

sgDysregulation
sgDysregulation

Reputation: 4417

I wrote a function to generalise

import pandas as pd
import re


def read_custom_table(filename,
                      rec_st_lim='{',
                      rec_end_lim='}',
                      k_v_sep=':',
                      item_sep=',',
                      index_col=None):
    """
    This function takes a text file name as input,
    read the text and extracts records
    and returns a pandas dataframe
    Inputs
    ---------------
    filename:  string containing system file name

    rec_st_lim: string of varied length(1+) marking the start of
    a single record

    rec_end_lim: string of varied length(1+) marking the end of
    a single record

    k_v_sep: key-value seperator within a an arbitray record.

    item_sep: item seperator, seperates key/value pairs

    index_col: the name of the column to use as index, default =None
    i.e. index is a numerical range
    ----------------
    Output: df, a dataframe with columns = the keys in an arbitrary
    record and index = index_col when index_col is not None

   """

    pattern = r"{}(.*?){}".format(rec_st_lim, rec_end_lim)

    with open(filename) as f:
        df = pd.DataFrame(
            list
            (map
             (lambda rec:
              dict([(el.strip() for el in r.split(k_v_sep))
                    for r in rec.split(item_sep) if len(r) > 1]),
              re.findall(pattern, f.read(), re.DOTALL)
              )
             )
        )
        f.close()
    if index_col:
        df.set_index(index_col, inplace=True)
    return df

the function can be used on the data in OP example as follows

df = read_custom_table('debug.txt',
                                                 rec_st_lim='\{Group\}',
                                                 rec_end_lim='\{End Group\}',
                                                 k_v_sep='=',
                                                 item_sep='\n',
                                                 index_col='Name')
print(df)

The output will

           Height Weight
Name                    
'Messi'    170 cm  72 kg
'Ronaldo'  187 cm  84 kg

Upvotes: 2

Rakesh
Rakesh

Reputation: 82765

One way to accomplish what you are doing is to do a string manuplation and convert the data to a list of dictionary and then convert it to a dataframe.

Example:

import pandas as pd

stringVal = ''
with open("Path to inputfile", "r") as infile:   #I have the data that you have posted in you question as content in input file
    for i in infile.readlines():
        if i.startswith("Name"):
            stringVal += (i+"|").replace("\n", "").replace("'", "")
        if i.startswith("Height"):
            stringVal += (i+"|").replace("\n", "")
        if i.startswith("Weight"):
            stringVal += i+"\n" 

res = []    
for i in stringVal.strip().split("\n"):
    if i:
        d = {}
        for j in i.split("|"):
            val = j.split("=")
            d[val[0].strip()] = val[1].strip()
        res.append(d)

df = pd.DataFrame(res)
df = df.set_index('Name') 
print df

Output:

         Height Weight
Name                  
Messi    170 cm  72 kg
Ronaldo  187 cm  84 kg

Upvotes: 0

Related Questions