Victor Nascimento
Victor Nascimento

Reputation: 781

How to match this expression with regex?

I have a text with some lines (200+) in this format:

10684 - The jackpot ? discuss   Lev 3    --- ? ---

10755 - Garbage Heap    ? discuss   Lev 5    --- ? ---

I hant to retrieve the first number (10684 or 10755) only if number after "Lev" is greater than 3. I'm able to get the first number with this regex: ([0-9]+) - but without the 'level' restrictions.

How this could be made?

Thanks in advance.

Upvotes: 0

Views: 87

Answers (5)

Mario Palumbo
Mario Palumbo

Reputation: 985

In bash use this:

var=">3"
perl -lne '/(\d+) - .*Lev (\d+)/; print $1 if $2'"$var"

This is a good solution to be able to pass the condition by parameter.

Upvotes: 0

tripleee
tripleee

Reputation: 189477

A bit of Awk trickery:

awk -F '\? +discuss +Lev' '$2>3 { split($1,a,/ */); print a[1] }' file

Upvotes: 0

pguardiario
pguardiario

Reputation: 54984

A lookahead is really the best thing because it will leave just the number:

/\d+(?=.*Lev (0*[4-9]|[1-9]\d))/

Upvotes: 0

Herrington Darkholme
Herrington Darkholme

Reputation: 6315

(\d+) - .*?Lev (?:[4-9]|[1-9]\d+)

The first \d+ matches line number as you have done.

The next .*? is a lazy quantifier, which will not consume too many characters. And the following expression will guide it to the right place. (lazy quantifier is usually more efficient)

The second parenthesis, (?:[4-9]|[1-9]\d+), matches either single digital numbers greater than 3 or two digital numbers without leading zero.

Alright stackoverflow doesn't properly show my image. Take this link : http://regexr.com?36n5l

Example Output:

enter image description here

Upvotes: 3

Andrew Cheong
Andrew Cheong

Reputation: 30273

Regular expressions doesn't recognize numbers as numbers (only strings). You can do this though:

([0-9]+) - .*Lev (?:[4-9][^0-9]|[1-9][0-9]+)

Basically, we use the alternation operator (|) to accept only a single digit greater than 3 (enforced by checking that the following character is not a digit) or a multi-digit number not beginning with a zero.

In case that level number might be the end of the line, though, you might have to do this:

([0-9]+) - .*Lev (?:[4-9](?:[^0-9]|$)|[1-9][0-9]+)

(I'm assuming whatever regex engine you're using can't handle lookaround assertions. In the future, try to always include what language you're using when you're asking a regex question.)


Ah, I just read your edit that the number is always less than 10. Well, that's much easier then:

([0-9]+) - .*Lev [4-9]

Upvotes: 1

Related Questions