yart
yart

Reputation: 7805

Getting number from the string using regexp

I have the following html line

<b>String :</b></b></td><td class="title">14</td>

I'm trying to parse it on order to get number only. Looks simple but "s/^.*\(:digit:\).*$/\1/" shows whole line. I tried also "s/^.*\(\d+\).*$/\1/" but it return the same result.

If try "s/^.*String.*>\(.*\)<.*$/\1/" command then it returns what is needed but "s/^.*String.*>\(\d+\)<.*$/\1/" returns again whole line.

Do you think is possible to get here number from the string specifying include only digit in group?

Edit: I need it for Java language. Example here is juts for getting working regular expression which I test using sed command.

Thank you.

Upvotes: 1

Views: 311

Answers (5)

Mike Caron
Mike Caron

Reputation: 14561

Although you don't explain what language you're using, the answer is simple.

When you have captured expressions (parenthesis), there are multiple results.

The first one, #0, is always the whole match. Since you have .* before and after the digits, the extra HTML is included in the result.

However, in the second match, #1, you should have only the number. The way to retrieve this result varies depending on the language, but if you update your question, we may be able to help you in that regard.

Edit:

public static String extractNumber(String input) {
    Pattern p = Pattern.compile("s/(\\d+)/");

    Matcher m = p.matcher(input);

    if(m.find()) {
        String num = m.group(1);
        return Integer.parseInt(num);
    }

    return null;
}

This will extract the first number it finds in the input text. And, it demonstrates how to use groups as well.

I haven't tested it since I don't have a proper java environment set up at the moment, but it looks okay. Let me know if you have any problems.

Upvotes: 0

cristian
cristian

Reputation: 8744

regex (?:<(?:[^>])+>)(\d+)(?:(?:<\/[^>]+)+>) capture only the numbers from your text that are betwen html tags

Upvotes: 0

The Archetypal Paul
The Archetypal Paul

Reputation: 41749

I think you have a slightly peculiar regex implementation. What's the environment?

   s/^[^\d]*\(\d+\)<[^\d]**$/\1/

Has to be worth a go, though. Check whether the set pattern needs [ or [ and if it allows character classes (\d) first. If no character classes 0-9 should do it.

Upvotes: 0

theChrisKent
theChrisKent

Reputation: 15099

In javascript you can do this:

var num = parseInt(someString.replace( /\D/g , ''));

Upvotes: 0

Sinan &#220;n&#252;r
Sinan &#220;n&#252;r

Reputation: 118118

Use HTML::TableExtract.

Upvotes: 3

Related Questions