Reputation: 13
So, I have a site that has an XML string, and I'd like my program to return a list of strings that appear between two strings. Here's my code:
response = requests.get(url)
artists=re.findall(re.escape('<name>')+'(.*?)'+re.escape('</name>'),str(response.content))
print(artists)
This returns a list of strings. The problem is, some strings have unwanted characters in them. For example, one of the strings in the list is "Somethin\\' \\'Bout A Truck" and I'd like it to be 'Somethin' 'Bout A Truck'.
Thanks in advance.
Upvotes: 1
Views: 81
Reputation: 881595
Those escapes (single backslashes, each displayed as \\
) may be "unwanted" from your viewpoint but they're no doubt "present" in the response you received. So if characters are present but unwanted, you can remove them, e.g using in lieu of str(response.content)
str(response.content).replace('\\'. '')
if what you actually want to do is remove all such escapes (if you want to do something different than that you'd better explain what it is:-).
BeautifulSoup4
as recommended in the accepted answer, though a nice package indeed, does not wantonly remove characters present in the input -- it can't read your mind, so it can't know what's "unwanted" to you. E.g:
>>> import bs4
>>> s = '<name>Somethin\\\' \\\'Bout A Truck</name>'
>>> soup = bs4.BeautifulSoup(s)
>>> print(soup)
<name>Somethin\' \'Bout A Truck</name>
>>>
As you see, the escapes (backslashes) are still there before the single-quotes.
Upvotes: 1
Reputation: 357
I think the beautiful soup(bs4) will solve this problem and it will also support for higher version of python 3.4
Upvotes: 1