Fokker
Fokker

Reputation: 11

Regex to match certain numbers from and to a range

I have tried chatgpt for a correct answer but it doesn't give me a working solution.

I want to match (only) the digits before the "m2" and only if it is preceded by the wording "landgang" (as is in this example):

"kolko 12/43 4:34, tetro gamm m2 landgang 2/2 metros tetro dento 12 343m2 psi 23"

Chatgpt gives me this solution but won't work... (?<=landgang\s)(\d+)(?=\s*m2)

Hope somebody can help me, many thanks

Upvotes: 0

Views: 90

Answers (3)

user3408541
user3408541

Reputation: 71

Ahoy!

This answer is in Perl pseudocode instead of Ruby, but Ruby uses PCRE regular expressions so it will be almost exactly the same solution. It is only a regex surrounded by an if conditional.

The PCRE regex is

/landgang[\w\W]*?([ \d]+?)m2/

Basically search for landgang, then anything including \n in a non-greedy fashion so as not to miss the closing tags. Then look for a series of one or more digits or spaces also non-greedily, immediately followed by m2. Store these digits in a backreference. I used [\w\W]*? instead of .*? in case there will be newlines in this sequence.

Here's the code...

#!/usr/bin/perl

my $s = "kolko 12/43 4:34, tetro gamm m2 landgang 2/2 metros tetro dento 12 343m2 psi 23";
my $matchingDigits;
if($s =~ /landgang[\w\W]*?([ \d]+?)m2/){
  $matchingDigits = $1;
}

if($matchingDigits){
  #trim leading whitespace
  $matchingDigits =~ s/^ +(\d)/\1/;
  print "$matchingDigits\n";
}else{
  print "no match found\n";
}

Output looks like this...

$ perl match.sequence.ruby.psudocode.pl
12 343

That pseudocode should translate pretty easily to Ruby, and all the regular expressions are PCRE and should be exactly the same. If you need any help translating just ask in the comments.

Upvotes: 0

Cary Swoveland
Cary Swoveland

Reputation: 110735

I have assumed that the object is to match a string of digits and white spaces that must:

  • begin with a digit;
  • must not be immediately preceded by a digit or by one or more white spaces that are immediately preceded by a digit;
  • must be preceded earlier in the string by the string "landgang" that is preceded and followed by word boundaries; and
  • must be immediately followed by the string "m2" that is followed by a word boundary.

You can match the desired string, provided the conditions above are satisfied, with the following regular expression.

\blandgang\b.*?\K\d[\d\s]*(?=m2\b)

See Rubular and Regex101.

Note that this expression does not employ a capture group. Naturally, if the assumptions I have made are incorrect the expression would have to be changed accordingly.

This approach is similar to that suggested by @CAustin in a comment on the question. It uses a capture group. Capturing is often faster than matching, but it is less pleasing to some coders (a known type of personality disorder).

The regular expression can be broken down as follows.

\b         # match a word boundary
landgang   # match s string
\b         # match a word boundary
.*?        # match zero or more digits, as few as possible
\K         # reset the start of the match and discard all 
           # consumed (previously matched) tokens
\d         # match a digit
[\d\s]*    # match zero or more digits or white spaces 
(?=m2\b)   # a *positive lookahead* asserts that the current match
           # is folled by "m2" followed by a word boundary

Upvotes: 0

buckley
buckley

Reputation: 14119

If you don't mind the spaces being matched (as the one after 12) this will do

(?<=landgang.*)[ \d]+(?=m2)

Note that this is using a non fixed width pattern in the lookaround which supported by some (js,c#) but not all regex engines. https://regex101.com/r/JJppI1/1

Upvotes: 1

Related Questions