Reputation: 25
I'm attempting to write a program in Python which does various tasks to a piece of code. I have done most of these, but one is perplexing me. I don't know enough of the jargon to be able to run an effective search for help with this problem, so I am resorting to asking here.
I need to create a process which reads anything in between parentheses as a single 'block'. Then, if a 'block' contains a specific word or phrase, the python code would delete it.
Example (simplified) Text File contents:
...
entity
{
"id" "38794"
"classname" "info_player_teamspawn"
}
entity
{
"id" "38795"
"classname" "func_detail"
solid
}
entity
{
"id" "38796"
"classname" "path_track"
}
...
In this example, there would be many thousands of these entities listed. I would want the python code to delete anything inside parentheses (and including the 'entity' preface) of any parentheses which contained the word 'solid', ie: This would be the resulting piece:
...
entity
{
"id" "38794"
"classname" "info_player_teamspawn"
}
entity
{
"id" "38796"
"classname" "path_track"
}
...
The id would not need to be corrected. We do not need to worry about that.
I hope I explained my problem well enough, and I hope there is a solution possible. If anyone would like to a library of jargon I could use to help explain or research any further problems I may have, that would be appreciated too!
Many thanks in advance!
Upvotes: 2
Views: 1438
Reputation: 215009
First, let's write a generator that yields titles ("entity") and their respective blocks:
def blocks(filename):
title, block = '', None
with open(filename) as fp:
for line in fp:
if '{' in line:
block = line
elif block is not None:
block += line
else:
title = line
if '}' in line:
yield title, block
title, block = '', None
Then read the blocks and output those passing the test:
for title, block in blocks('input.txt'):
if 'solid' not in block:
print title, block
Upvotes: 1
Reputation: 1729
Here's a non-regex solution. It might be a little more verbose, but also more intuitive.
input = open("a.txt", "rb")
output = open("b.txt", "wb") # an empty file for output
def filter_block(instream, outstream, keyword):
block_buffer = []
in_block = False
dump_block = False
for line in instream: # <- Iterate through the lines of the input
line = line.rstrip()
block_buffer.append(line) # <- Keep the block of text in memory
line_text = line.strip()
if line_text == "{":
in_block = True
elif line_text == keyword and in_block: # <- Check if this block
dump_block = True # needs to be dumped
elif line_text == "}":
if not dump_block: # <- If not,
outstream.write("\n".join(block_buffer)) # <- keep it.
#print "\n".join(block_buffer)
block_buffer = [] # <- Flush buffer, continue
in_block = dump_block = False
filter_block(input, output, "solid")
Upvotes: 0
Reputation: 11322
It is possible to do everything using a single regular expression. However, that quickly becomes unreadable, expecially as you span multiple lines (and I guess you may have other patterns you might want to remove).
I would split the problem in two:
First, find all the entity blocks using this regular expression:
p = re.compile(r'entity\s*{(.*?)}')
Then define a substitute function that does the replacement.
def remove_solid(match):
text = match.groups(0)
if text.find('solid') != -1:
return ''
else
return text
Hook these two together like this
output = p.sub(remove_solid, input)
Upvotes: 1
Reputation: 1550
You can use a regular expressions (regex) to search for the following pattern and replace matched text with line break or space.
import re
[...]
output = re.sub(r'entity\n{[\w\s\n"]*solid[\w\s\n"]*\n}\n', '', input)
[...]
Upvotes: 0