Reputation: 835
My script:
def fetch_online():
pattern = re.search('(<span class="on">)(.*)(</span>)', data)
return pattern.group(2)
print fetch_online()
Inside data
, there is one line that contains this:
<b><span><span class="on">5879</span> users online</span></b>
However, when ran, the output is this:
5879</span> users online
How should I fix this so it only grabs the data before the first </span>
?
Upvotes: 0
Views: 205
Reputation: 6665
Use the non-greedy quantifier: (<span class="on">)(.*?)(</span>)
.
To learn more about the non-greedy quantifier, read the "Laziness Instead of Greediness" section at Regular-Expressions.info.
Just to reiterate what has already been said in the comments, parsing HTML using regular expressions is highly discouraged.
Upvotes: 3
Reputation: 6203
In your specific case here, got for <span class="on">)(\d+)</span>
. In a more general approach, go for non-greedy:
<span class="on">(.*?)</span>
Upvotes: 4