Ninjasoup
Ninjasoup

Reputation: 173

Remove a block of text from a file

I have searched this and tried to adapt some answers to my problem.

The text format I'm trying to remove is like this:

text
text 
            rectangle 
           (
                gab "BACKGROUND" 
                set("can be different") 
                origin(can be different) 
                width(can be different)
                height(can be different)
            )
text
text

I'm trying to remove the rectangle and everything in between the brackets including the brackets. These rectangles appear a few times in the file.

So far I have the following:

def removeBlock(): 

for somefile in os.listdir(source_folder):
    if (somefile.startswith(('DTSPSM_')) and somefile.endswith(('.ddl'.lower()))):
        with open(os.path.join(source_folder, somefile), 'r') as file :

            lines = file.read()
            lines.strip()
            for lines in file:
                blockstart = lines.index('rectangle')         
                del(lines[blockstart:blockstart+7])               
                open(os.path.join(source_folder, somefile), 'w+').writelines(lines) 

But it doesn't remove the rectangle lines, can someone help please?

My current (working code) code looks like this now:

import shutil
import tempfile
from pathlib import Path
import typing

source_folder = r'C:\Test'
test_file = r'C:Test\DTSPSM_01.ddl'

def parseFile(src_file: typing.Union[Path, str], simulation: bool=False):
    print(src_file)
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with src_file.open('r') as input_file:
        counter = 0
        for line in input_file:
            if line.strip() =='rectangle':
                counter = 7
                print(f'-{counter}-removing: {line}')
                continue
            elif counter > 0:
                counter -= 1
                print(f'-{counter}-removing: {line}')
                continue
            else:
                yield line

def clean_file(src_file:typing.Union[Path, str], simulation: bool=False):
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with tempfile.TemporaryDirectory() as tmpdirname:
        temp_filename = f'{tmpdirname}/{src_file.name}.txt'
        with open(temp_filename, 'w') as temp_file:
            for line in parseFile(src_file, simulation=simulation):
                temp_file.write(line)
        if not simulation:
            shutil.copy(temp_filename, src_file)

def main():

    for src_file in Path(source_folder).glob("DTSPSM_*.ddl"):
        print(f'***{src_file}***\n')
        clean_file(src_file)


if __name__== "__main__":
    main()         

Upvotes: 1

Views: 296

Answers (1)

Maarten Fabré
Maarten Fabré

Reputation: 7058

To prevent writing and reading from the same file you can use a temporary file, and then copy this to replace the original file. This way you also don't need to worry about what happens when something interrupts the process

Parsing a single file:

def parse_file(src_file: typing.Union[Path, str], simulation: bool=False):
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with src_file.open('r') as input_file:
        counter = 0
        for line in input_file:
            if line.strip() =='rectangle':
                counter = 7
                print(f'-{counter}-removing: {line}')
                if simulation:
                    yield line
                continue
            elif counter > 0:
                counter -= 1
                print(f'-{counter}-removing: {line}')
                if simulation:
                    yield line
                continue
            else:
                yield line

This can be easily expanded where you feed a dict of words and linenumbers to ignore as extra argument. I've added a simulation argument so you can do a dry run without altering anything

This can be easily tested by passing a file you know:

test_file = "DTSPSM_0.dll"
for line in parse_file(test_file):
    print(line)
text1
text2 
-7-removing:             rectangle 
-6-removing:            (
-5-removing:                 gab "BACKGROUND" 
-4-removing:                 set("can be different") 
-3-removing:                 origin(can be different) 
-2-removing:                 width(can be different)
-1-removing:                 height(can be different)
-0-removing:             )
text3
text4
def clean_file(src_file:typing.Union[Path, str], simulation: bool=False):
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with tempfile.TemporaryDirectory() as tmpdirname:
        temp_filename = f'{tmpdirname}/{src_file.name}.txt'
        with open(temp_filename, 'w') as temp_file:
            for line in parse_file(src_file, simulation=simulation):
                temp_file.write(line)
        if not simulation:
            shutil.copy(temp_filename, src_file)

This writes the lines of the parsed file to a file in a temporary directory, and when finished copies this temporary file to overwrite the original file

This can be easily tested with 1 file too

Finding the files:

pathlib.Path.glob is easier than os.listdir

for src_file in Path(sourcedir).glob("DTSPSM_*.dll"):
    print(f'***{src_file}***\n')
    clean_file(src_file, simulation=True)

This can be easily tested seperately by commenting the last line

Upvotes: 1

Related Questions