daydreamer
daydreamer

Reputation: 91959

python regex: pattern not found

I have a pattern compiled as

pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
join_pattern = '|'.join(pattern_strings)
pattern = re.compile(join_pattern)

and then I find pattern in file as

def find_pattern(path):
    with open(path, 'r') as f:
        for line in f:
            print line
            found = pattern.search(line)
            if found:
                print dir(found)
                logging.info('found - ' + found)

and my input as path file is

\xc2d 
d\xa0 
\xe7 
\xc3\ufffdd 
\xc3\ufffdd 
\xc2\xa0 
\xc3\xa7 
\xa0\xa0 
'619d813\xa03697' 

When I run this program, nothing happens.

I it not able to catch these patterns, what is am I doing wrong here?

Desired output - each line because each line has one or the other matching pattern

Update

After changing the regex to

pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']

It is still the same, no output

UPDATE

after making regex to

pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
join_pattern = '[' + '|'.join(pattern_strings) + ']'
pattern = re.compile(join_pattern)

Things started to work, but partially, the patterns still not caught are for line

\xc2\xa0 
\xc3\xa7 
\xa0\xa0 

for which my pattern string is ['\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0']

Upvotes: 1

Views: 1902

Answers (2)

Joran Beasley
Joran Beasley

Reputation: 113940

escape the \ in the search patterns either with r"\xa0" or as "\\xa0"

do this ....

 ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']

like everyones been saying to do except the one guy you listened too...

Upvotes: 2

BrenBarn
BrenBarn

Reputation: 251355

Does your file actually contain \xc2d --- that is, five characters: a backslash followed by c, then 2, then d? If so, your regex won't match it. Each of your regexes will match one or two characters with certain character codes. If you want to match the string \xc2d your regex needs to be \\xc2d.

Upvotes: 0

Related Questions