Reputation: 53
I have lists like this:
boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
I'm trying to iterate over them and find matching indexes like '<c>'
, '</c>'
and remove those pieces. They have to be next to each other and matching in order to be removed. After the indices are removed, it iterates over the list again and keeps removing until the list is empty or until it cannot anymore.
I'm thinking something like:
for i in range(len(boo)):
for b in boo:
if boo[i]== '</'+ b +'>' and boo[i-1] == '<' + b +'>':
boo.remove(boo[i])
boo.remove(boo[i-1])
print(boo)
but that doesn't appear to be doing anything. Can someone point me to my problem?
EDIT
I changed it to more like this, but it is saying i is not defined. How is what I have not defining i?
def valid_html1(test_strings):
valid = []
for h in test_strings:
boo = re.findall('\W+\w+\W', h)
while i in boo == boo[i]:
if boo[i][1:] == boo[i+1][2:]:
boo.remove(boo[i])
boo.remove(boo[i+1])
print(boo)
valid_html1(example_set)
Upvotes: 0
Views: 88
Reputation: 106445
You should parse the strings to extract the tag names from the angle brackets before you make comparisons. You can use zip
to pair adjacent tags, and keep appending items to a new list only if its adjacent item is not of the same name:
boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
while True:
pairs = zip(boo, boo[1:] + [''])
new_boo = []
for a, b in pairs:
if a.startswith('<') and a.endswith('>') and \
b.startswith('</') and b.endswith('>') and a[1:-1] == b[2:-1]:
next(pairs)
boo = new_boo
boo.extend(a for a, _ in pairs)
break
new_boo.append(a)
else:
break
print(boo)
This outputs:
[]
And if boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>', '<d>']
, this outputs:
['<d>']
Upvotes: 1
Reputation: 913
In 99% of the cases you shouldn't be editing a list while iterating.
This solution makes a copy and then edits the original list:
boo_copy = boo[:]
for i, b in enumerate(boo_copy)
if i == 0:
continue
stripped_tag = b.replace("</","").replace(">","").replace("<","") # Removes first and last char to remove '<' and '>'
if boo[i]== '</'+ stripped_tag +'>' and boo[i-1] == '<' + stripped_tag +'>':
boo.remove(boo[i])
boo.remove(boo[i-1])
print(boo)
This assumes that the tags are unique in the list.
Upvotes: 0
Reputation: 27201
import re
def open_tag_as_str(tag):
m = re.match(r'^<(\w+)>$', tag)
return None if m is None else m.group(1)
def close_tag_as_str(tag):
m = re.match(r'^</(\w+)>$', tag)
return None if m is None else m.group(1)
def remove_adjacent_tags(tags):
def closes(a, b):
a = open_tag_as_str(a)
b = close_tag_as_str(b)
return a is not None and b is not None and a == b
# This is a bit ugly and could probably be improved with
# some itertools magic or something
skip = False
for i in range(len(tags)):
if skip:
skip = False
elif i + 1 < len(tags) and closes(tags[i], tags[i + 1]):
skip = True
else:
yield tags[i]
boo = ['<a>', '<b>', '<c>', '</c>', '</b>', '</a>']
boo = list(remove_adjacent_tags(boo))
print(boo)
Gives:
['<a>', '<b>', '</b>', '</a>']
Upvotes: 0