Reputation: 111
I wrote a Python script which opens a list of csv files hosting mass spectrometry data, gathers the data with Numpy.genfromtxt, makes calculations based on these data using statsmodels and returns the results in a compiled excel file. Inside the CSV file the header and the internal structure may be of variable size depending on the running conditions of the experiment.
For now I use a config file which I read with configparser, and I use different config files for different experimental conditions. However this is pretty clunky.
What I want to do is to measure the header size and the length of the dataframe, instead of reading it from a config file. The data for each isotope starts with a string, such as:
*#ISOTOPE, 'Ar36:L2S1'* and *#ISOTOPE, 'Ar37:L1S1'*
followed by the data for each isotope (3 columns), for example:
*#ISOTOPE, 'Ar36:L2S1'*
No, Time, Intensity
1, 101.4685919, 1.845379369941e-003
2, 102.4901003, 2.153738546096e-003
.....
599, 701.1342959, 2.087938052439e-003
600, 702.1343039, 2.000204060898e-003
(blank line)
*#ISOTOPE, 'Ar37:L1S1'*
No, Time, Intensity
1, 101.4685919, -1.103785922163e-004
2, 102.4901003, 3.526673114000e-004
etc.
I want to determine the row number of the data and the length of the data for each isotope.
When I then try to import the whole datafile without ignoring the headers (to count the row index) I get errors related to the number of columns. I tried usecols = 1 to ignore the rest but this does not work.(valueerror)
I assume there is there a simple solution to this, but my programming skills are not very good so far.
Can anyone help?
Cheers
Upvotes: 1
Views: 2861
Reputation: 111
OK, Masklinn pointed me in the right direction. The following code returns the index of the sections I am looking for:
FileList = (glob.glob("*.csv"))
for FileToProcess in FileList:
with open(FileToProcess) as readfile:
for cnt, line in enumerate(readfile):
if "#ISOTOPE" in line:
print("Line {}:{}".format(cnt, line))
readfile.close
Thanks a lot!
Upvotes: 0
Reputation: 42502
It's not entirely clear but my understanding is you have a bunch of CSV-ish datasets inside a single file, with a header line (starting with *#ISOTOPE
) and a blank "footer" line for each?
Depending on the size, an option might be to open files the basic way (using the open
builtin), then loop on:
Repeat until the end of the file.
Upvotes: 1