RY4N
RY4N

Reputation: 1112

regular expression question (python)

I want to read a word html file and grab any words which contain letters of a name but not print them if the words are longer than the name

# compiling the regular expression:
keyword = re.compile(r"^[(rR)|(yY)|(aA)|(nN)]{5}$/")

if keyword.search (line):
    print line,

i am grabbing the words with this but don't seem to be limiting the size properly.

Upvotes: 1

Views: 106

Answers (2)

eyquem
eyquem

Reputation: 27585

Your RE "^[(rR)|(yY)|(aA)|(nN)]{5}$/" will never never never give a matching in any string on earth and elsewhere, I think, because of the '/' character after '$'

See the results of the RE without this '/':

import re

pat = re.compile("^[(rR)|(yY)|(aA)|(nN)]{5}$")

for ch in ('arrrN','Aar)N','()|Ny','NNNNN',
           'marrrN','12Aar)NUUU','NNNNN!'):
    print ch.ljust(15),pat.search(ch)

result

arrrN           <_sre.SRE_Match object at 0x011C8EC8>
Aar)N           <_sre.SRE_Match object at 0x011C8EC8>
()|Ny           <_sre.SRE_Match object at 0x011C8EC8>
NNNNN           <_sre.SRE_Match object at 0x011C8EC8>
marrrN          None
12Aar)NUUU      None
NNNNN!          None

My advice: think of [.....] in a RE as representing ONE character at ONE position. So every character that is between the brackets is one of the options of represented character.

Moreover, as said by Adrien Plisson, between brackets [......] a lot of special characters lost their speciality. Hence '(', ')','|' don't define group and OR, they represent just these characters as some of the options along with the letters 'aArRyYnN'

.

"^[rRyYaAnN]{1,5}$" will match only strings as 'r',ar','YNa','YYnA','Nanny'

If you want to match the same words anywhere in a text, you will need "[rRyYaAnN]{1,5}"

Upvotes: 1

Adrien Plisson
Adrien Plisson

Reputation: 23313

it seems you are looking for keyword.match() instead of keyword.search(). you should read this part of the python documentation which discusses the difference between match and search.

also, your regular expression seems completely off... [ and ] delimits a set of characters, so you can't put groups and have a logic around the groups. as written, your expression will also match all (, ) and |. you may try the following:

keyword = re.compile(r"^[rRyYaAnN]{5}$")

Upvotes: 3

Related Questions