Reputation: 13887
I'm just starting to figure out regex and would love some help trying to understand it. I've been using this to help me get started, but am still having some trouble figuring it out.
What I am trying to do is take this text:
<td>8.54/10 over 190 reviews</td>
And pull out the "8.54", so basically anything in between the first ">" and the "/"
Using my noob skills, I came up with this: [0-9].[0-9][0-9], which WILL match that 8.54, and will work for everything BUT 10.00, which I do need to account for.
Can anyone help me refine my expression to apply to that last case as well?
Upvotes: 1
Views: 100
Reputation: 6317
This might work:
\>(.*?)/
# (.*?) is a "non-greedy" group which maches as few characters as possible
Then access the actual value using
m.group(1)
where m is the match object returned by re.search or re.finditer
If you want to access the value directly (re.findall), use
(?>=\>)(.*?)(?=/)
Upvotes: 0
Reputation: 137787
\d
is often used instead of [0-9]
(mnemonically, “digit”) and it's necessary to remember that sometimes fractional numbers are written without any digits before the decimal point. Thus:
(?<=>)(?:\d+(?:\.\d*)?|\.\d+)(?=/)
OK, that's a bit of a complex RE. Here's how it breaks down (in extended form).
(?<= > ) # With a “>” before (but not matched)…
(?: # … match either this
\d+ # at least one digit, followed by…
(?: # …match
\. \d* # a dot followed by any number of digits
) ? # optionally
| # … or this
\. \d+ # a dot followed by at least one digit
) #
(?= / ) # … and with a “/” afterwards (but not matched)
Upvotes: 0
Reputation: 121840
Use quantifiers.
You want one or more digits, followed by a dot, followed by one or more digits. A digit can also be written \d
, and the "one or more" quantifier is +
.
The dot needs to be escaped as it is a regex metacharacter which means "any character". Your regex therefore should be:
\d+\.\d+
Now, beware that a quantifier applies to atoms only. Character classes ([...]
), complemented character classes ([^...]
) and special character classes (\d
, \w
...) are atoms, however if you want to apply a quantifier to more than a simple atom, you'll need to group these atoms using the grouping operator, ()
. Ie, (ab)+
will look for one or more of ab
.
Upvotes: 8
Reputation: 13887
Maybe answered my own question. Found this:
[0-9]+(?:.[0-9]*)
It seems to work, does anyone have any changes to this?
Upvotes: 2