Read pandas dataframe from csv beginning with non-fix header

Question

I have a number of data files produced by some rather hackish script used in my lab. The script is quite entertaining in that the number of lines it appends before the header varies from file to file (though they are of the same format and have the same header).

I am writing a batch to process all of these files to dataframes. How can I make pandas identify the correct header if I do not know the position? I know the exact heder text, and the text of the two lines that come directly before it (they are the only consecutive instances of in the document).

I have tried to define null skipping at the end of the document and select the (thankfully) fixed number of data rows each file contains:

df = pd.read_csv(myfile, skipfooter=0, nrows=267)

That did not work.

Do you have any further ideas?

alko · Accepted Answer

You can open file and iterate it until consecutive are met, and pass result to parser, i.e.

with open(csv_file_name, 'rb') as source:
    consec_empty_lines = 0
    for line in source:
        if line == '
':
            consec_empty_lines += 1
            if consec_empty_lines == 2: 
                break
        else:
            consec_empty_lines = 0
    df = pd.read_csv(source)

Read pandas dataframe from csv beginning with non-fix header

Answers (1)

Related Questions