Reputation: 33
My requirement is to read multiple CSV files that have the data I'm interested in after n
number of rows. This number n
is not constant, as it varies with different CSVs (hence, I cannot use skiprows).
The format of the CSV is as follows :
Test: Rate1, "2" , units
specimen: Rectangular, "3", units
Time, Estimate, Load
(s) , (units) , (N)
"1","2","4"
"5","8","12"
Another CSV would probably be :
Test: Rate1, "2" , units
specimen: Rectangular, "3" , units
value_based : Sample7, "9" , product
Test_condition: controlled, "0" , test
Time, Estimate, Load
(s) , (units) , (N)
"12","6","8"
"18","3","2"
But, the only column names I'm interested in are : [Time
, Estimate
, Load
].
I want to do the following:
Derive data with specifying headers to be Time
, Estimate
and Load
.
Skip the first row of values ( (s) , (units), (N)
) as I want to concatenate these with the headers and rename them to be
Time(s) , Estimate(units), Load(N)
.
This is what I have tried:
with open(file,"r+",newline="") as csvFile:
dictReader = csv.DictReader(csvFile)
for row in dictReader:
print(row["Time"], row["Load"], row["Extension"])
df = pd.read_csv(file,usecols=["Time","Load","Extension"])
print(df["Time"].head(3))
Please suggest how I would need to proceed to be getting the data with my intended headers. Thanks in advance.
Upvotes: 2
Views: 792
Reputation: 46759
I don't think Pandas itself is able to determine the correct starting row, but it could be done with a little preparsing of the file to determine the correct row. For example:
import pandas as pd
import csv
filename = 'test.csv'
header_row = ["Time", "Estimate", "Load"]
with open(filename, newline='') as f_csv:
for row_number, row in enumerate(csv.reader(f_csv), start=-1):
if row == header_row:
break
df = pd.read_csv(filename, skiprows=row_number, names=header_row)
print(df)
Giving:
Time Estimate Load
0 1 2 4
1 5 8 12
Upvotes: 2