Geochron
Geochron

Reputation: 111

How can I find the row number of a certain string in a text file in Python?

I wrote a Python script which opens a list of csv files hosting mass spectrometry data, gathers the data with Numpy.genfromtxt, makes calculations based on these data using statsmodels and returns the results in a compiled excel file. Inside the CSV file the header and the internal structure may be of variable size depending on the running conditions of the experiment.

For now I use a config file which I read with configparser, and I use different config files for different experimental conditions. However this is pretty clunky.

What I want to do is to measure the header size and the length of the dataframe, instead of reading it from a config file. The data for each isotope starts with a string, such as:

*#ISOTOPE, 'Ar36:L2S1'* and *#ISOTOPE, 'Ar37:L1S1'*

followed by the data for each isotope (3 columns), for example:

*#ISOTOPE, 'Ar36:L2S1'*

No, Time, Intensity

1, 101.4685919, 1.845379369941e-003

2, 102.4901003, 2.153738546096e-003

.....

599, 701.1342959, 2.087938052439e-003

600, 702.1343039, 2.000204060898e-003

(blank line)

*#ISOTOPE, 'Ar37:L1S1'*

No, Time, Intensity

1, 101.4685919, -1.103785922163e-004

2, 102.4901003, 3.526673114000e-004

etc.

I want to determine the row number of the data and the length of the data for each isotope.

When I then try to import the whole datafile without ignoring the headers (to count the row index) I get errors related to the number of columns. I tried usecols = 1 to ignore the rest but this does not work.(valueerror)

I assume there is there a simple solution to this, but my programming skills are not very good so far.

Can anyone help?

Cheers

Upvotes: 1

Views: 2861

Answers (2)

Geochron
Geochron

Reputation: 111

OK, Masklinn pointed me in the right direction. The following code returns the index of the sections I am looking for:

FileList = (glob.glob("*.csv"))
for FileToProcess in FileList:
        with open(FileToProcess) as readfile:
            for cnt, line in enumerate(readfile):
                if "#ISOTOPE" in line:
                    print("Line {}:{}".format(cnt, line))
        readfile.close

Thanks a lot!

Upvotes: 0

Masklinn
Masklinn

Reputation: 42502

It's not entirely clear but my understanding is you have a bunch of CSV-ish datasets inside a single file, with a header line (starting with *#ISOTOPE) and a blank "footer" line for each?

Depending on the size, an option might be to open files the basic way (using the open builtin), then loop on:

  • process the magic header (read one line and parse that)
  • copy everything to a temporary file or StringIO until the first blank line
  • parse the tempfile or StringIO as CSV, process as usual

Repeat until the end of the file.

Upvotes: 1

Related Questions