Reputation: 1191

Search for string and delete line that contains string and the line underneath

I have a text file that contains

### 174.10.150.10 on 2018-06-20 12:19:47.533613 ###
IP : 174.10.150.10 : 

IP : ALL :

I currently have code that uses Regex to search for a date/time string. How can I delete a line that contains the string that I find? I want to delete that line and also the line underneath.

So both of these lines would get deleted:

### 174.10.150.10 on 2018-06-20 12:19:47.533613 ###
IP : 174.10.150.10 :

My code currently just adds 'None' to the bottom of the text file.

import re

def run():  
    try:
        with open('file.txt', 'r') as f:
            with open('file.txt', 'a') as f2:
                reg = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{0,})\s###')
                for line in f:
                    m = reg.match(line)
                answer = raw_input("Delete line? ")
                if answer == "y":

                    # delete line that contains "###" and line underneath
                    f2.write(str(m))

                else:
                    print("You chose no.")
    except OSError as e:
        print (e)

run()

Upvotes: 0

Answers (3)

Rory Daulton

Reputation: 22564

(EDIT: I now understand from your comments that you have a blank line after two data lines, so when you want to delete a line you also want to delete the next two lines. My code has been adjusted to do that.)

Here is some code, making various changes to your code. I wrote a new file rather than overwriting the old file, for safety and to avoid needing to keep the entire file in memory at once. I combined the with lines into one line, for readability; similarly, I split the regex string to allow shorter lines of code. To avoid having more than one line in memory at once, I used a countdown variable skipline to note if a line is to be skipped in the new file. I also show each line before asking whether or not to delete it (with its following line). Note that lines that do not have the date and time are copied, by checking that the regexp match variable is None. Finally, I changed raw_input to input so this code will run in Python 3. Change it back to raw_input for Python 2.

By the way, the reason your code just adds 'None' to the end of the file is that you put your write line outside the main loop over the lines of the file. Thus you write only the regex match object for the last line of the file. Since the last line in your file does not have a date and time, the regex did not match so the string representation of the failed match is 'None'. In your second with statement you opened file.txt in append mode, so that 'None' is appended to the file.

I want to emphasize that you should create a new file. If you really want to overwrite the old file, the safe way to do that is to create a new file first with a slightly different name. Then if that file is made successfully, overwrite the old file with the new file and rename one copy to something like file.bak. This takes possible OS errors into account, as your code attempts to do. Without something like that, an error could end up deleting your file completely or mangling it. I leave that part of the code to you.

import re

def run():  
    try:
        with open('file.txt', 'r') as f, open('file.tmp', 'w') as f2:
            reg = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
                             '.+(\d{4}-\d{2}-\d{2}\s\d{2}'
                             ':\d{2}:\d{2}.\d{0,})\s###')
            skipline = 0  # do not skip lines
            for line in f:
                if skipline:
                    skipline -= 1
                    continue  # Don't write or process this line
                m = reg.match(line)
                if m:
                    answer = input("Delete line {} ? ".format(m.group()))
                    if answer == "y":
                        skipline = 2 # leave out this and next 2 lines
                    else:
                        print("You chose no.")
                if not skipline:
                    f2.write(line)
    except OSError as e:
        print(e)

run()

Upvotes: 1

Melvin Abraham

Reputation: 3046

With some basic refactoring, here's the result...

import re
valid_lines = []

def run():  
    try:
        with open('file.txt', 'r') as f:
            reg = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{0,})\s###\s?')
            lines = f.readlines()
            invalid_index = -10

            for a in range(len(lines)):
                reg_result = reg.match(lines[a])

                if invalid_index == (a - 1):
                    # Skip the line underneath the invalid line
                    continue

                if reg_result != None:
                    # If the line matches the regexp.
                    invalid_index = a
                    answer = raw_input("Delete line? ")

                    if answer.lower() != 'y':
                        print("You chose no.")
                        valid_lines.append(lines[a])
                else:
                    valid_lines.append(lines[a])

        with open('file.txt', 'w') as f:
            # Override the file...
            f.writelines(valid_lines)

    except OSError as e:
        print (e)

run()

If you want to remove any lines that start with ### then, maybe you should consider this as the regexp: ###.*

EDIT: In your regular expression, you should add a \s? at the end to optionally match \n, as the file contains newlines. Also, use fullmatch() instead of match().

Upvotes: 1

Aaron

Reputation: 1368

I refactor the filtering part into a function called filter_lines and move the regex as module variable. This approach make use of iterator.

import re

regex = re.compile('###\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}.+(\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{0,})\s###')

def filter_lines(lines):
    it = iter(lines)

    try:
        while True:
            line = next(it)
            m = regex.match(line)

            if m:
                # You may add the question-answer code here to ask the user whether delete the matched line.
                next(it)  # Comsume the line following the commented line
                continue

            yield line
    except StopIteration:
        # In the future, StopIteration raised in generator function will be converted to RuntimeError so it have to be caught.
        # https://www.python.org/dev/peps/pep-0479/
        pass

def run():  
    try:
        with open('file.txt', 'r') as f:
            with open('file.txt', 'a') as f2:
                filtered_lines = list(filter_lines(f1.readlines()))
                print(*filtered_lines, sep='')
                # You may use the following line to actually write the result to a file
                # f2.writelines(filtered_lines)
    except OSError as e:
        print (e)

run()

This program should print the resultant content.

Upvotes: 1

Search for string and delete line that contains string and the line underneath

Answers (3)

Related Questions