Parsing CSV headers

Question

My requirement is to read multiple CSV files that have the data I'm interested in after n number of rows. This number n is not constant, as it varies with different CSVs (hence, I cannot use skiprows).

The format of the CSV is as follows :

 Test: Rate1, "2" , units
 specimen: Rectangular, "3", units

Time, Estimate, Load
(s) , (units) , (N)
"1","2","4"
"5","8","12"

Another CSV would probably be :

 Test: Rate1, "2" , units
 specimen: Rectangular, "3" , units
 value_based : Sample7, "9" , product
 Test_condition: controlled, "0" , test


Time, Estimate, Load
(s) , (units) , (N)
"12","6","8"
"18","3","2"

But, the only column names I'm interested in are : [Time, Estimate , Load].

I want to do the following:

Derive data with specifying headers to be Time, Estimate and Load.
Skip the first row of values ( (s) , (units), (N) ) as I want to concatenate these with the headers and rename them to be Time(s) , Estimate(units), Load(N).

This is what I have tried:

with open(file,"r+",newline="") as csvFile:
    dictReader = csv.DictReader(csvFile)
    for row in dictReader:
        print(row["Time"], row["Load"], row["Extension"])

df = pd.read_csv(file,usecols=["Time","Load","Extension"])
print(df["Time"].head(3))

Please suggest how I would need to proceed to be getting the data with my intended headers. Thanks in advance.

Martin Evans · Accepted Answer

I don't think Pandas itself is able to determine the correct starting row, but it could be done with a little preparsing of the file to determine the correct row. For example:

import pandas as pd
import csv

filename = 'test.csv'
header_row = ["Time", "Estimate", "Load"]

with open(filename, newline='') as f_csv:
    for row_number, row in enumerate(csv.reader(f_csv), start=-1):
        if row == header_row:
            break

df = pd.read_csv(filename, skiprows=row_number, names=header_row)
print(df)

Giving:

   Time  Estimate  Load
0     1         2     4
1     5         8    12

Parsing CSV headers

Answers (1)

Related Questions