tirenweb
tirenweb

Reputation: 31739

Python 2.5.2: remove what found between two lines that contain two concrete strings

is there any way to remove what found between two lines that contain two concrete strings?

I mean: I want to remove anything found between 'heaven' and 'hell' in a text file with this text:

I'm in heaven
foobar
I'm in hell

After executing the script/function I'm asking the text file will be empty.

Upvotes: 0

Views: 3463

Answers (5)

nosklo
nosklo

Reputation: 223062

Use a flag to indicate whether you're writing or not.

from __future__ import with_statement

writing = True

with open('myfile.txt') as f:
    with open('output.txt') as out:
        for line in f:
            if writing:
                if "heaven" in line:
                    writing = False
                else:
                    out.write(line)
            elif "hell" in line:
                writing = True    
os.remove('myfile.txt')
os.rename('output.txt', 'myfile.txt')

EDIT

As extraneon pointed in the comments, the requirement is to remove the lines between two concrete strings. That means that if the second (closing) string is never found, nothing should be removed. That can be achieved by keeping a buffer of lines. The buffer gets discarded if the closing string "I'm in hell" is found, but if the end of file is reached without finding it, the whole contents must be written to the file.

Example:

I'm in heaven
foo
bar

Should keep the whole contents since there's no closing tag and the question says between two lines.

Here's an example to do that, for completion:

from __future__ import with_statement

writing = True
with open('myfile.txt') as f:
    with open('output.txt') as out:
        for line in f:
            if writing:
                if "heaven" in line:
                    writing = False
                    buffer = [line]
                else:
                    out.write(line)
            elif "hell" in line:
                writing = True
            else:
                buffer.append(line)
        else:
            if not writing:
                #There wasn't a closing "I'm in hell", so write buffer contents
                out.writelines(buffer)

os.remove('myfile.txt')
os.rename('output.txt', 'myfile.txt')

Upvotes: 3

tirenweb
tirenweb

Reputation: 31739

see below. I dont know if it's ok but It seems is working ok.

import re,fileinput,os


for path, dirs, files in os.walk(path):
    for filename in files:
        fullpath = os.path.join(path, filename)


        f = open(fullpath,'r')


        data = f.read()

        patter = re.compile('Im in heaven.*?Im in hell', re.I | re.S)
        data = patter.sub("", data)

        f.close()

        f = open(fullpath, 'w')

        f.write(data)
        f.close()

Anyway when i execute it, it leaves a blank line. I mean, if have this function:

public function preFetchAll(Doctrine_Event $event){ 
//Im in heaven
$a = sfContext::getInstance()->getUser()->getAttribute("passw.formulario");
var_dump($a);
//Im in hell
foreach ($this->_listeners as $listener) {
    $listener->preFetchAll($event);
}
}

and i execute my script, i get this:

public function preFetchAll(Doctrine_Event $event){ 

foreach ($this->_listeners as $listener) {
    $listener->preFetchAll($event);
}
}

As you can see there is an empty line between "public..." and "foreach...".

Why?

Javi

Upvotes: -1

Alex Martelli
Alex Martelli

Reputation: 882231

Looks like by "remove" you mean "rewrite the input file in-place" (or make it look like you're so doing;-), in which case fileinput.input helps:

import fileinput
writing = True
for line in fileinput.input(['thefile.txt'], inplace=True):
    if writing:
        if 'heaven' in line: writing = False
        else: print line,
    else:
        if 'hell' in line: writing = True

Upvotes: 1

Jacinda
Jacinda

Reputation: 5072

You could do something like the following with regular expressions. There are probably more efficient ways to do it since I'm still learning a lot of python, but this should work.

import re

f = open('hh_remove.txt')
lines = f.readlines()

pattern1 = re.compile("heaven",re.I)
pattern2 = re.compile("hell",re.I)

mark1 = False
mark2 = False

for i, line in enumerate(lines):
    if pattern1.search(line) != None:
        mark1 = True
        set1 = i
    if pattern2.search(line) != None:
        mark2 = True
        set2 = i+1
    if ((mark1 == True) and (mark2 == True)):
        del lines[set1:set2]
        mark1 = False
        mark2 = False

f.close()
out = open('hh_remove.txt','w')
out.write("".join(lines))
out.close()

Upvotes: 0

wescpy
wescpy

Reputation: 11167

I apologize but this sounds like a homework problem. We have a policy on these: https://meta.stackexchange.com/questions/10811/homework-on-stackoverflow

However, what I can say is that the feature @nosklo wrote about is available in any Python 2.5.x (or newer), but you need to learn enough Python to enable it. :-)

My solution would involve using creating a new string with the undesired stuff stripped out using str.find() or str.index() (or some relative of those 2).

Best of luck!

Upvotes: -1

Related Questions