Reputation: 4571
I have a Python dictionary containing HTML that I would later like to parse using beautifulsoup
, but before parsing I would like to remove white-space directly adjacent to tag elements.
For example:
string = "text <tag>some texts</tag> <tag> text</tag> some text"
>>> remove_whitespace(string)
'text<tag>some texts</tag><tag>text</tag>some text'
Upvotes: 0
Views: 1399
Reputation: 336408
Assuming that you're allowing any kind of tag name, and that tags never contain angle brackets within them, you can quickly solve this with a regex:
>>> import re
>>> string = "text <tag>some texts</tag> <tag> text</tag> some text"
>>> regex = re.compile(r"\s*(<[^<>]+>)\s*")
>>> regex.sub("\g<1>", string)
'text<tag>some texts</tag><tag>text</tag>some text'
Explanation:
\s* # Match any number of whitespace characters
( # Match and capture in group 1:
< # - an opening angle bracket
[^<>]+ # - one or more characters except angle brackets
> # - a closing angle bracket
) # End of group 1 (used to restore the matched text later)
\s* # Match any number of whitespace characters
Upvotes: 1