Reputation: 91959
I have a pattern compiled as
pattern_strings = ['\xc2d', '\xa0', '\xe7', '\xc3\ufffdd', '\xc2\xa0', '\xc3\xa7', '\xa0\xa0', '\xc2', '\xe9']
join_pattern = '|'.join(pattern_strings)
pattern = re.compile(join_pattern)
and then I find pattern in file as
def find_pattern(path):
with open(path, 'r') as f:
for line in f:
print line
found = pattern.search(line)
if found:
print dir(found)
logging.info('found - ' + found)
and my input as path
file is
\xc2d
d\xa0
\xe7
\xc3\ufffdd
\xc3\ufffdd
\xc2\xa0
\xc3\xa7
\xa0\xa0
'619d813\xa03697'
When I run this program, nothing happens.
I it not able to catch these patterns, what is am I doing wrong here?
Desired output - each line because each line has one or the other matching pattern
Update
After changing the regex to
pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
It is still the same, no output
UPDATE
after making regex to
pattern_strings = ['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
join_pattern = '[' + '|'.join(pattern_strings) + ']'
pattern = re.compile(join_pattern)
Things started to work, but partially, the patterns still not caught are for line
\xc2\xa0
\xc3\xa7
\xa0\xa0
for which my pattern string is ['\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0']
Upvotes: 1
Views: 1902
Reputation: 113940
escape the \ in the search patterns
either with r"\xa0"
or as "\\xa0"
do this ....
['\\xc2d', '\\xa0', '\\xe7', '\\xc3\\ufffdd', '\\xc2\\xa0', '\\xc3\\xa7', '\\xa0\\xa0', '\\xc2', '\\xe9']
like everyones been saying to do except the one guy you listened too...
Upvotes: 2
Reputation: 251355
Does your file actually contain \xc2d
--- that is, five characters: a backslash followed by c, then 2, then d? If so, your regex won't match it. Each of your regexes will match one or two characters with certain character codes. If you want to match the string \xc2d
your regex needs to be \\xc2d
.
Upvotes: 0