Reputation: 149
Hi I have a lot of corpus I parse them to extract all patterns:
For the first case I did this regexp and don't get all matches:
>>> p = re.compile("[A-Z]+[0-9]+")
>>> res = p.search("aze azeaz GR55 AP1 PM89")
>>> res
<re.Match object; span=(10, 14), match='GR55'>
and for the second one:
>>> s = re.compile("[A-Z]+[a-z]+\s[A-Z]+[a-z]+\s[A-Z]+[a-z]+")
>>> resu = s.search("this is a test string, Hello Little Monkey, How Are You ?")
>>> resu
<re.Match object; span=(23, 42), match='Hello Little Monkey'>
>>> resu.group()
'Hello Little Monkey'
it's seems working but I want to get all matches when parsing a whole 'big' line.
Upvotes: 0
Views: 671
Reputation: 27723
This expression might help you to do so, or design one. It seems you wish that your expression would contain at least one [A-Z] and at least one [0-9]:
(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)
This graph shows how your expression would work, and you can test more in this link:
This code shows how the expression would work in Python:
# -*- coding: UTF-8 -*-
import re
string = "aze azeaz GR55 AP1 PM89"
expression = r'(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)'
match = re.search(expression, string)
if match:
print("YAAAY! \"" + match.group(1) + "\" is a match 💚💚💚 ")
else:
print('🙀 Sorry! No matches! Something is not right! Call 911 👮')
YAAAY! "GR55" is a match 💚💚💚
This JavaScript snippet shows the performance of your expression using a simple 1-million times for
loop.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = 'aze azeaz GR55 AP1 PM89';
var regex = /(.*?)(?=[A-Z])(?=.+[0-9])([A-Z0-9]+)/g;
var match = string.replace(regex, "$2 ");
}
end = Date.now() - start;
console.log("YAAAY! \"" + match + "\" is a match 💚💚💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");
Upvotes: 2
Reputation:
Try these 2 regex:
(for safety, they are enclosed by whitespace/comma boundary's)
>>> import re
>>> teststr = "aze azeaz GR55 AP1 PM89"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[0-9]+(?![^\s,])", teststr)
>>> print(res)
['GR55', 'AP1', 'PM89']
>>>
(?<! [^\s,] )
[A-Z]+ [0-9]+
(?! [^\s,] )
and
>>> import re
>>> teststr = "this is a test string, ,Hello Little Monkey, How Are You ?"
>>> res = re.findall(r"(?<![^\s,])[A-Z]+[a-z]+(?:\s[A-Z]+[a-z]+){1,}(?![^\s,])", teststr)
>>> print(res)
['Hello Little Monkey', 'How Are You']
>>>
(?<! [^\s,] )
[A-Z]+ [a-z]+
(?: \s [A-Z]+ [a-z]+ ){1,}
(?! [^\s,] )
Upvotes: 2