Mirry610
Mirry610

Reputation: 23

Extracting specific data from multiple text files and writing them into columns in csv

I'm trying to write a code that will search for specific data from multiple report files, and write them into columns in a single csv.

The report file lines i'm looking for aren't always on the same line, so i'm looking for the data associated on the lines below:

Estimate file: pog_example.bef

Estimate ID: o1_p1

61078 (100.0%) estimated.

And I want to write the data from each text file into columns in a csv as below:

example.bef, o1_p1, 61078 (100.0%) estimated

So far I have this script which will list out the first of my criteria, but I can't figure out how to loop it through to find my second and third lines to populate the second and third columns

from glob import glob
import fileinput
import csv

with open('percentage_estimated.csv', 'w', newline='') as est_report:
    writer = csv.writer(est_report)
    for line in fileinput.input(glob('*.bef*')):

        if 'Estimate file' in line:
            writer.writerow([line.split('pog_')[1].strip()]) 

I'm pretty new to python so any help would be appreciated!

Upvotes: 0

Views: 724

Answers (2)

Mirry610
Mirry610

Reputation: 23

if anyone wants to see what finally worked for me

from glob import glob
import csv

all_rows = []

with open('percentage_estimated.csv', 'w', newline='') as bef_report:
    writer = csv.writer(bef_report)
    writer.writerow(['File name', 'Est ID', 'Est Value'])
    for file in glob('*.bef*'):
        with open(file,'r') as f:
            for line in f:
                if 'Estimate file' in line:
                    fname = line.split('pog_')[1].strip()
                    line = next(f)
                    est_id = line.split('Estimate ID:')[1].strip()
                    line = next(f)
                    line = next(f)
                    line = next(f)
                    line = next(f)
                    line = next(f)
                    line = next(f)
                    line = next(f)
                    value = line.strip()
                    row = [fname, est_id, value]
                    all_rows.append(row)
                    break  
            writer.writerows(all_rows)

Upvotes: 1

Zach Young
Zach Young

Reputation: 11188

I think I see what you're trying to do, but I'm not sure.

I think your BEF file might look something like this:

a line
another line
Estimate file: pog_example.bef
Estimate ID: o1_p1
61078 (100.0%) estimated.
still more lines

If that's true, then once you find a line with 'Estimate file', you need to take control from the for-loop and start manually iterating the lines because you know which lines are coming up.

This is a very simple example script which opens my mock BEF file (above) and automatically iterates the lines till it finds 'Estimate file'. From there it processes each line specifically, using next(bef_file) to iterate to the next line, expecting them to have the correct text:

import csv

all_rows = []

bef_file = open('input.bef')
for line in bef_file:
    if 'Estimate file' in line:
        fname = line.split('pog_')[1].strip()

        line = next(bef_file)
        est_id = line.split('Estimate ID:')[1].strip()

        line = next(bef_file)
        value = line.strip()

        row = [fname, est_id, value]
        all_rows.append(row)
        break  # stop iterating lines in this file

csv_out = open('output.csv', 'w', newline='')
writer = csv.writer(csv_out)
writer.writerow(['File name', 'Est ID', 'Est Value'])
writer.writerows(all_rows)

When I run that I get this for output.csv:

File name,Est ID,Est Value
example.bef,o1_p1,61078 (100.0%) estimated.

If there are blank lines in your data between the lines you care about, manually step over them with next(bef_file) statements.

Upvotes: 1

Related Questions