Jet Holt
Jet Holt

Reputation: 25

Python - reading between parenthesis as a single 'block'

I'm attempting to write a program in Python which does various tasks to a piece of code. I have done most of these, but one is perplexing me. I don't know enough of the jargon to be able to run an effective search for help with this problem, so I am resorting to asking here.

I need to create a process which reads anything in between parentheses as a single 'block'. Then, if a 'block' contains a specific word or phrase, the python code would delete it.

Example (simplified) Text File contents:

...
entity
{
    "id" "38794"
    "classname" "info_player_teamspawn"
}
entity
{
    "id" "38795"
    "classname" "func_detail"
    solid
}
entity
{
    "id" "38796"
    "classname" "path_track"
}
...

In this example, there would be many thousands of these entities listed. I would want the python code to delete anything inside parentheses (and including the 'entity' preface) of any parentheses which contained the word 'solid', ie: This would be the resulting piece:

...
entity
{
    "id" "38794"
    "classname" "info_player_teamspawn"
}
entity
{
    "id" "38796"
    "classname" "path_track"
}
...

The id would not need to be corrected. We do not need to worry about that.

I hope I explained my problem well enough, and I hope there is a solution possible. If anyone would like to a library of jargon I could use to help explain or research any further problems I may have, that would be appreciated too!

Many thanks in advance!

Upvotes: 2

Views: 1438

Answers (5)

georg
georg

Reputation: 215009

First, let's write a generator that yields titles ("entity") and their respective blocks:

def blocks(filename):
    title, block = '', None
    with open(filename) as fp:
        for line in fp:
            if '{' in line:
                block = line
            elif block is not None:
                block += line
            else:
                title = line
            if '}' in line:
                yield title, block
                title, block = '', None

Then read the blocks and output those passing the test:

for title, block in blocks('input.txt'):
    if 'solid' not in block:
        print title, block

Upvotes: 1

jsvk
jsvk

Reputation: 1729

Here's a non-regex solution. It might be a little more verbose, but also more intuitive.

input = open("a.txt", "rb")
output = open("b.txt", "wb") # an empty file for output

def filter_block(instream, outstream, keyword):
    block_buffer = []
    in_block = False
    dump_block = False
    for line in instream:                 # <- Iterate through the lines of the input
        line = line.rstrip()
        block_buffer.append(line)         # <- Keep the block of text in memory

        line_text = line.strip()
        if line_text == "{":
            in_block = True
        elif line_text == keyword and in_block:            # <- Check if this block
            dump_block = True                              #    needs to be dumped
        elif line_text == "}":
            if not dump_block:                             # <- If not, 
                outstream.write("\n".join(block_buffer))   # <- keep it.
                #print "\n".join(block_buffer)

            block_buffer = []                              # <- Flush buffer, continue
            in_block = dump_block = False



filter_block(input, output, "solid")

Upvotes: 0

Hans Then
Hans Then

Reputation: 11322

It is possible to do everything using a single regular expression. However, that quickly becomes unreadable, expecially as you span multiple lines (and I guess you may have other patterns you might want to remove).

I would split the problem in two:

First, find all the entity blocks using this regular expression:

p = re.compile(r'entity\s*{(.*?)}')

Then define a substitute function that does the replacement.

def remove_solid(match):
    text = match.groups(0)
    if text.find('solid') != -1:
        return ''
    else
        return text 

Hook these two together like this

output = p.sub(remove_solid, input)

Upvotes: 1

Kent
Kent

Reputation: 195179

how about:

 re.sub("entity\s*{[^}]*solid\s*}",'',yourString)

Upvotes: 0

pogo
pogo

Reputation: 1550

You can use a regular expressions (regex) to search for the following pattern and replace matched text with line break or space.

import re

[...]
output = re.sub(r'entity\n{[\w\s\n"]*solid[\w\s\n"]*\n}\n', '', input)
[...]

Upvotes: 0

Related Questions