user1417933
user1417933

Reputation: 835

regex grabbing too much info

My script:

def fetch_online():
    pattern = re.search('(<span class="on">)(.*)(</span>)', data)
    return pattern.group(2)

print fetch_online()

Inside data, there is one line that contains this:

        <b><span><span class="on">5879</span> users online</span></b>

However, when ran, the output is this:

5879</span> users online

How should I fix this so it only grabs the data before the first </span>?

Upvotes: 0

Views: 205

Answers (2)

creemama
creemama

Reputation: 6665

Use the non-greedy quantifier: (<span class="on">)(.*?)(</span>).

To learn more about the non-greedy quantifier, read the "Laziness Instead of Greediness" section at Regular-Expressions.info.

Just to reiterate what has already been said in the comments, parsing HTML using regular expressions is highly discouraged.

Upvotes: 3

dda
dda

Reputation: 6203

In your specific case here, got for <span class="on">)(\d+)</span>. In a more general approach, go for non-greedy:

<span class="on">(.*?)</span>

Upvotes: 4

Related Questions