Luke West
Luke West

Reputation: 104

Python: Converting a byte object to string, removing \'s, then writing to list brings back \'s. slashes

I'm parsing html that's currently in byte form by converting it to a string then writing it to a list. I want to remove all forward-slashes (or even just nicely handle escape characters).

Here's my code:

picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="whatever;"/>']

def get_alt_text(picture_divs):
    alt_text = []
    for i, elem in enumerate(picture_divs):
        str_elem = str(elem).replace('\\', '')  # Convert bytes -> strings
        start_index = int(str_elem.find('alt='))
        end_index = int(str_elem.find('class='))
        alt_text.append(str_elem[start_index + 4:end_index])

    return alt_text


alt_text_return = get_alt_text(picture_divs)
print(alt_text_return)

Output: ['"Python\'s Confusing me." ']

Desired output: ['"Python's Confusing me." ']

Upvotes: 0

Views: 112

Answers (2)

user11924970
user11924970

Reputation:

Here is one possible solution to clean it up:

>>> from re import sub
>>> picture_divs = [b'<img alt="Python\'s Confusing me." class="" src="https://link_goes_here" style="wha
tever;"/>']
>>> for div in picture_divs:
...     rev1 = sub(r'[\\/]', '', div.decode('utf-8'))
...     rev2 = rev1.replace('\'', "'")
...     print(rev2)
... 
<img alt="Python's Confusing me." class="" src="https:link_goes_here" style="whatever;">
>>> 

Upvotes: 1

girishsaraf03
girishsaraf03

Reputation: 109

The solution that you are asking for is an error for python syntax. Python creates list of the format

list_example = ['a','b']

If you wish to have 'Python's confusing' me in the list, then you see how the single quote opened is closed by your single quote. So python puts the backslash in order to override the single quote and not throw an error.

Upvotes: 1

Related Questions