user10019553
user10019553

Reputation:

Text Extraction from real world messy files

WET READ: ___ ___ ___ 7:31 PM
 Persistent right lower lung opacity and right pleural effusion. Effusion
   perhaps slightly decreased since radiograph dated ___.
 ______________________________________________________________________________
                                 FINAL REPORT

PA AND LATERAL CHEST RADIOGRAPH.

TECHNIQUE:  AP upright portable radiograph of chest was reviewed in comparison
to prior radiograph from ___.

 
 As compared to ___ there is interval improvement of pulmonary edema. 
 Right lower lobe consolidation with internal cavitation surrounded by pleural
 effusion appears to be grossly unchanged in the short interim.  There is no
 evidence of progression of left consolidation.  Small amount of left pleural
 effusion is noted.  

I have text files like this, I want to extract data from the files after "FINAL REPORT". Means I want to delete everything from my text files up to and including "FINAL REPORT".

I have tried regular expression, But couldn't find method to do that.

Upvotes: 0

Views: 40

Answers (1)

Tim Roberts
Tim Roberts

Reputation: 54733

Just do it line by line:

def copy( infile ):
    tempname = infile + '.tmp'
    if os.path.exists( tempname ):
        os.remove( tempname )

    keep = False
    with open(infile) as fin, open(tempname,'w') as fout:
        for line in fin
            if "FINAL REPORT" in line:
                keep = True
            if keep:
                print( line.strip(), file=fout )
    os.remove( infile )
    os.rename( tempfile, infile )

Upvotes: 1

Related Questions