Optional match for beginning of line

Question

I am trying to create a regular expression in Python that matches #hashtags. My definition on a hashtag is:

It is a work that starts with a #
It can contain all characters except [ ,\.]
It can be anywhere in the text

So in this text

#This string cont#ains #four, and #only four #hashtags.

The hashes here are This, four, only and hashtags.

The problem I have is the optional check for the beginning of line.

[ \.,]+ won't do it since it won't match the optional beginning.
[ \.,]? won't do it since it matches too much.

Example with +

In []: re.findall('[ \.,]+#([^ \.,]+)', '#This string cont#ains #four, and #only four #hashtags.')
Out[]: ['four', 'only', 'hashtags']

Example with ?

In []: re.findall('[ \.,]?#([^ \.,]+)', '#This string cont#ains #four, and #only four #hashtags.')
Out[]: ['This', 'ains', 'four', 'only', 'hashtags']

How can optional match the beginning of the line?

Blender · Accepted Answer

This seems to work:

>>> re.findall(r'\B#([^,\W]+)', '#This string cont#ains #four, and #only four #hashtags.')
['This', 'four', 'only', 'hashtags']

\B: Matches the empty string, but only when it is not at the beginning or end of a word. This means that r'py\B' matches 'python', 'py3', 'py2', but not 'py', 'py.', or 'py!'. \B is just the opposite of \b, so is also subject to the settings of LOCALE and UNICODE.
\W: When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; this is equivalent to the set [^a-zA-Z0-9_]. With LOCALE, it will match any character not in the set [0-9_], and not defined as alphanumeric for the current locale. If UNICODE is set, this will match anything other than [0-9_] plus characters classied as not alphanumeric in the Unicode character properties database.

Optional match for beginning of line

Answers (2)

Related Questions