jaydeepsb
jaydeepsb

Reputation: 557

Pandas extract comment lines

I have a data file containing a first few lines of comments and then the actual data.

#param1 : val1
#param2 : val2
#param3 : val3
12
2
1
33
12
0
12
...

I can read the data as pandas.read_csv(filename, comment='#',header=None). However I also wish to separately read the comment lines in order to extract read the parameter values. So far I only came across skipping or removing the comment lines, but how to also separately extract the comment lines?

Upvotes: 5

Views: 3555

Answers (2)

Warren Wong
Warren Wong

Reputation: 31

Maybe you can read this file again in normal way, read each line to get your parameters.

def get_param( filename):
    para_dic = {}
    with  open(filename,'r') as cmt_file:    # open file
        for line in cmt_file:    # read each line
            if line[0] == '#':    # check the first character
                line = line[1:]    # remove first '#'
                para = line.split(':')     # seperate string by ':'
                if len(para) == 2:
                    para_dic[ para[0].strip()] = para[1].strip()
    return para_dic

This function will return a dictionary contain parameters.

{'param3': 'val3', 'param2': 'val2', 'param1': 'val1'}

Upvotes: 2

Elliot
Elliot

Reputation: 2690

In the call to read_csv you can't really. If you're just processing a header you can open the file, extract the commented lines and process them, then read in the data in a separate call.

from itertools import takewhile
with open(filename, 'r') as fobj:
    # takewhile returns an iterator over all the lines 
    # that start with the comment string
    headiter = takewhile(lambda s: s.startswith('#'), fobj)
    # you may want to process the headers differently, 
    # but here we just convert it to a list
    header = list(headiter)
df = pandas.read_csv(filename)

Upvotes: 5

Related Questions