Python: Reading and extracting data from multiples files and writing extracted data in multiple files

Question

I need to read consecutive 200 files separately with name like nwirp1.rec........nwirp200.rec, then extract data from all the separate file into different consecutive files. I have written code like this. But it is not working.My data is like this

Parameters ----->

Parameter Estimated value

hklay1 3.278692E-06

kppt1 4.249307E-07

kppt2 2.849132E-06

See file nwirp_nsmc.sen for parameter sensitivities.

I need to extract this portion from each file

hklay1 3.278692E-06

kppt1 4.249307E-07

kppt2 2.849132E-06

and write them on to different output file like data1.txt...........data200.txt

I have tried this way but it is not working:

for i in range(1, 200):
    with open('nwirp%s.upw' % i, 'r') as f:
        for line in f:
            if line.strip().startswith("Parameter      Estimated value"):
                new_file = []
                line = next(f)
            while not line.strip().startswith("See file"):
                new_file.append(line)
                line = next(f)
            with open('nwirp%s.upw' % i, 'w') as outfile:
                print >>outfile, "".join(new_file)

It is showing NameError: name 'new_file' is not defined.

zwer · Accepted Answer

Your first line match (f line.strip().startswith("Parameter...) might not work properly, hence new_file doesn't get defined, which probably results in the specified error when you try to append to it or write it down.

Instead of hunting the data line by line, provided that the files are not too large, I'd suggest just simplifying it by using regex to capture the lines between your strings then overwriting the content with the matched lines:

import re

matcher = re.compile(r"Estimated value\s+(.*?)\s+See file", re.DOTALL)
for i in xrange(1, 201):  # replace xrange with range when using Python 3.x
    with open("nwirp{}.upw".format(i), "r+") as f:  # open in read-write
        content = matcher.findall(f.read()) # read whole file and grab the match(es?)
        f.seek(0)  # go back to the beginning
        f.write("".join(content)) # concatenate just in case of more matches
        f.truncate()  # remove the extra content

That is assuming that you want to overwrite to the file you are reading, as expressed in your code, if you want to write to a different file (data1...data200.txt), instead of f.seek()...f.truncate() lines use:

with open("data{}.txt".format(i), "w") as out:
    out.write("".join(content)) # concatenate just in case of more matches

If you don't want to use regex, provided the simple structure of your match, you can achieve a similar effect with string.find() to find the first and the last lines' indexes and then get a substring of everything between those two.

Python: Reading and extracting data from multiples files and writing extracted data in multiple files

Answers (1)

Related Questions