Jacques de Hooge
Jacques de Hooge

Reputation: 7000

Why doesn't $ in my regex match end of line

I have the following test program:

import re

class Test:
    def __init__ (self):
        self.idFiltering = True
        self.aliases = [
            ('rose', 'jasmin')
        ]

        for s in (
            '__rose__',
            'rose',

            'moon__rose',
            'rose__fish',
            'moon__rose__jelly__fish',
            'moon__rose__rose__rose__fish',

            'sun.moon.rose',
            'rose.fish',
            'rosexfish',
            'moon.rose.jelly__fish',

            'moon/rose',
            'rose/fish',
            'moon/rose/jelly__fish',

        ):   
            print (s, self.filterId (s))
        print ('done')

    def filterId (self, qualifiedId):
        if not self.idFiltering or (qualifiedId.startswith ('__') and qualifiedId.endswith ('__')):
            return qualifiedId
        else:        
            for alias in self.aliases:
                pattern = re.compile (rf'((__)|(?=[^./])){alias [0]}((__)|(?=[./$]))')

                # Replace twice to deal with overlap
                qualifiedId = pattern.sub (alias [1], qualifiedId)
                qualifiedId = pattern.sub (alias [1], qualifiedId)

            return qualifiedId

test = Test ()

I expect it to produce:

__rose__ __rose__
rose jasmin
moon__rose moon__jasmin
rose__fish jasminfish
moon__rose__jelly__fish moonjasminjelly__fish
moon__rose__rose__rose__fish moonjasminjasminjasminfish
sun.moon.rose sun.moon.jasmin
rose.fish jasmin.fish
rosexfish rosexfish
moon.rose.jelly__fish moon.jasmin.jelly__fish
moon/rose moon/jasmin
rose/fish jasmin/fish
moon/rose/jelly__fish moon/jasmin/jelly__fish
done

But it produces:

__rose__ __rose__
rose rose
moon__rose moon__rose
rose__fish jasminfish
moon__rose__jelly__fish moonjasminjelly__fish
moon__rose__rose__rose__fish moonjasminjasminjasminfish
sun.moon.rose sun.moon.rose
rose.fish jasmin.fish
rosexfish rosexfish
moon.rose.jelly__fish moon.jasmin.jelly__fish
moon/rose moon/rose
rose/fish jasmin/fish
moon/rose/jelly__fish moon/jasmin/jelly__fish
done

In other words, it doesn't replace 'rose' at the end of a word. It seems to ignore the $ in my pattern. What am I doing wrong?

[EDIT after comments of Aran-Fey and Pushpesh Kumar Rajwanshi]

I've changed the regex to:

rf'((__)|(?=[^./])){alias [0]}((__)|(?=[./])|$)'

and it works fine now, so my problem is solved.

I've also tried:

rf'(^|(__)|(?=[./])){alias [0]}((__)|(?=[./])|$)'

but that does not work. Just curious: Why not?

[EDIT2]

As Rarblack pointed out, my solution just worked by sheer luck. With his/her suggestion I think I found the right regex:

rf'(^|(__)|(?<=[./])){alias [0]}((__)|(?=[./])|$)'

It produces the expected output, and this time not by coincidence.

Upvotes: 1

Views: 61

Answers (1)

Rarblack
Rarblack

Reputation: 4664

When you put special regex attributes in [] they lose their meaning and act like ordinary characters. That is why [./$] is not working. Also, putting ^ inside square brackets means not to filter through all attributes inside it: [^./].

Upvotes: 2

Related Questions