tnw
tnw

Reputation: 13887

Some Regex stuff

I'm just starting to figure out regex and would love some help trying to understand it. I've been using this to help me get started, but am still having some trouble figuring it out.

What I am trying to do is take this text:

<td>8.54/10 over 190 reviews</td>

And pull out the "8.54", so basically anything in between the first ">" and the "/"

Using my noob skills, I came up with this: [0-9].[0-9][0-9], which WILL match that 8.54, and will work for everything BUT 10.00, which I do need to account for.

Can anyone help me refine my expression to apply to that last case as well?

Upvotes: 1

Views: 100

Answers (4)

hlt
hlt

Reputation: 6317

This might work:

\>(.*?)/

# (.*?) is a "non-greedy" group which maches as few characters as possible

Then access the actual value using

m.group(1)

where m is the match object returned by re.search or re.finditer

If you want to access the value directly (re.findall), use

(?>=\>)(.*?)(?=/)

Upvotes: 0

Donal Fellows
Donal Fellows

Reputation: 137787

\d is often used instead of [0-9] (mnemonically, “digit”) and it's necessary to remember that sometimes fractional numbers are written without any digits before the decimal point. Thus:

(?<=>)(?:\d+(?:\.\d*)?|\.\d+)(?=/)

OK, that's a bit of a complex RE. Here's how it breaks down (in extended form).

(?<= > )          # With a “>” before (but not matched)…
(?:               # … match either this
   \d+            #   at least one digit, followed by…
   (?:            #   …match
      \. \d*      #     a dot followed by any number of digits
   ) ?            #   optionally
|                 # … or this
   \. \d+         #   a dot followed by at least one digit
)                 #
(?= / )           # … and with a “/” afterwards (but not matched)

Upvotes: 0

fge
fge

Reputation: 121840

Use quantifiers.

You want one or more digits, followed by a dot, followed by one or more digits. A digit can also be written \d, and the "one or more" quantifier is +.

The dot needs to be escaped as it is a regex metacharacter which means "any character". Your regex therefore should be:

\d+\.\d+

Now, beware that a quantifier applies to atoms only. Character classes ([...]), complemented character classes ([^...]) and special character classes (\d, \w...) are atoms, however if you want to apply a quantifier to more than a simple atom, you'll need to group these atoms using the grouping operator, (). Ie, (ab)+ will look for one or more of ab.

Upvotes: 8

tnw
tnw

Reputation: 13887

Maybe answered my own question. Found this:

[0-9]+(?:.[0-9]*)

It seems to work, does anyone have any changes to this?

Upvotes: 2

Related Questions