Reputation: 17924
I have a number of data files produced by some rather hackish script used in my lab. The script is quite entertaining in that the number of lines it appends before the header varies from file to file (though they are of the same format and have the same header).
I am writing a batch to process all of these files to dataframes. How can I make pandas identify the correct header if I do not know the position? I know the exact heder text, and the text of the two lines that come directly before it (they are the only consecutive instances of \r\n
in the document).
I have tried to define null skipping at the end of the document and select the (thankfully) fixed number of data rows each file contains:
df = pd.read_csv(myfile, skipfooter=0, nrows=267)
That did not work.
Do you have any further ideas?
Upvotes: 3
Views: 1490
Reputation: 48317
You can open file and iterate it until consecutive \r\n
are met, and pass result to parser, i.e.
with open(csv_file_name, 'rb') as source:
consec_empty_lines = 0
for line in source:
if line == '\r\n':
consec_empty_lines += 1
if consec_empty_lines == 2:
break
else:
consec_empty_lines = 0
df = pd.read_csv(source)
Upvotes: 3