marOne
marOne

Reputation: 149

RegEx for matching capital letters and numbers

Hi I have a lot of corpus I parse them to extract all patterns:

  1. like how to extract all patterns like: AP70, ML71, GR55, etc..
  2. and all patterns for a sequence of words that start with capital letter like: Hello Little Monkey, How Are You, etc..

For the first case I did this regexp and don't get all matches:

>>> p = re.compile("[A-Z]+[0-9]+")
>>> res = p.search("aze azeaz GR55 AP1 PM89")
>>> res
<re.Match object; span=(10, 14), match='GR55'>

and for the second one:

>>> s = re.compile("[A-Z]+[a-z]+\s[A-Z]+[a-z]+\s[A-Z]+[a-z]+")
>>> resu = s.search("this is a test string, Hello Little Monkey, How Are You ?")
>>> resu
<re.Match object; span=(23, 42), match='Hello Little Monkey'>
>>> resu.group()
'Hello Little Monkey'

it's seems working but I want to get all matches when parsing a whole 'big' line.

Upvotes: 0

Views: 671

Answers (2)

Emma
Emma

Reputation: 27723

This expression might help you to do so, or design one. It seems you wish that your expression would contain at least one [A-Z] and at least one [0-9]:

(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)

enter image description here

Graph

This graph shows how your expression would work, and you can test more in this link:

enter image description here

Example Code:

This code shows how the expression would work in Python:

# -*- coding: UTF-8 -*-
import re

string = "aze azeaz GR55 AP1 PM89"
expression = r'(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)'
match = re.search(expression, string)
if match:
    print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else: 
    print('🙀 Sorry! No matches! Something is not right! Call 911 👮')

Example Output

YAAAY! "GR55" is a match 💚💚💚 

Performance

This JavaScript snippet shows the performance of your expression using a simple 1-million times for loop.

repeat = 1000000;
start = Date.now();

for (var i = repeat; i >= 0; i--) {
	var string = 'aze azeaz GR55 AP1 PM89';
	var regex = /(.*?)(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)/g;
	var match = string.replace(regex, "$2 ");
}

end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Upvotes: 2

user557597
user557597

Reputation:

Try these 2 regex:

(for safety, they are enclosed by whitespace/comma boundary's)


>>> import re
>>> teststr = "aze azeaz GR55 AP1 PM89"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[0-9]+(?![^\s,])", teststr)
>>> print(res)
['GR55', 'AP1', 'PM89']
>>>

Readable regex

 (?<! [^\s,] )
 [A-Z]+ [0-9]+ 
 (?! [^\s,] )

and

>>> import re
>>> teststr = "this is a test string, ,Hello Little Monkey, How Are You ?"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[a-z]+(?:\s[A-Z]+[a-z]+){1,}(?![^\s,])", teststr)
>>> print(res)
['Hello Little Monkey', 'How Are You']
>>>

Readable regex

 (?<! [^\s,] )
 [A-Z]+ [a-z]+ 
 (?: \s [A-Z]+ [a-z]+ ){1,}
 (?! [^\s,] )

Upvotes: 2

Related Questions