Reputation: 104
I'm parsing html that's currently in byte form by converting it to a string then writing it to a list. I want to remove all forward-slashes (or even just nicely handle escape characters).
Here's my code:
picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="whatever;"/>']
def get_alt_text(picture_divs):
alt_text = []
for i, elem in enumerate(picture_divs):
str_elem = str(elem).replace('\\', '') # Convert bytes -> strings
start_index = int(str_elem.find('alt='))
end_index = int(str_elem.find('class='))
alt_text.append(str_elem[start_index + 4:end_index])
return alt_text
alt_text_return = get_alt_text(picture_divs)
print(alt_text_return)
Output: ['"Python\'s Confusing me." ']
Desired output: ['"Python's Confusing me." ']
Upvotes: 0
Views: 112
Reputation:
Here is one possible solution to clean it up:
>>> from re import sub
>>> picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="wha
tever;"/>']
>>> for div in picture_divs:
... rev1 = sub(r'[\\/]', '', div.decode('utf-8'))
... rev2 = rev1.replace('\'', "'")
... print(rev2)
...
<img alt="Python's Confusing me." class="" src="https:link_goes_here" style="whatever;">
>>>
Upvotes: 1
Reputation: 109
The solution that you are asking for is an error for python syntax. Python creates list of the format
list_example = ['a','b']
If you wish to have 'Python's confusing' me in the list, then you see how the single quote opened is closed by your single quote. So python puts the backslash in order to override the single quote and not throw an error.
Upvotes: 1