vlr
vlr

Reputation: 790

Asterisk in regex

I am trying to get the parts of the text line that are after colon. For example from this text

previous usc contact name:*assistant director of field education*

agency name:*development corporation

I want to get the following:

assistant director of field education

1010 development corporation

I tried the following regex

.*:\*?(.*)\**$ 

It did not work. What is working right now is this:

.*:\*?(.*)\*

I do not understand why it is working on the second line where it does not have asterisk, and regex requires asterisk. And I do not understand why the first regex does not work properly.

Thanks.

Upvotes: 5

Views: 25955

Answers (1)

dognose
dognose

Reputation: 20889

In a nutshell:

The second regex .*:\*?(.*)\* works, because:

.* is matching:

  • previous usc contact name and
  • agency name

followed by :\* (escaped * means: match *).

(.*)\* is finally matching EVERYHTING until the LAST *.

(Assuming you missed the star in the last line, this matches:)

  • assistant director of field education and
  • development corporation

Why the first regex fails is hard to tell from the example given. .*:\*?(.*)\**$ means, that the END OF THE LINE needs to be zero or multiple * (\**)

Assuming, your line breaks are as provided, it will only match development corporation, because the anchor $ (line end) normaly bahaves in single-line mode, means "end of String". Therefore the regex is only able to match ONCE. If you change the modifier to be multiline-mode (meaning, $ matches every \r\n rather than just the END OF STRING) will give you the required result.


SingleLine-Mode, matching:

  • development corporation

    .*:\*?(.*)\**$

Regular expression visualization

Debuggex Demo


Multiline-Mode matching:

  • assistant director of field education and
  • development corporation

    .*:\*?(.*)\**$

Regular expression visualization

Debuggex Demo


The beavhiour of ^ and $ depends on the modifier:

given the String

Hello
World

and using ^(.*)$ in single-line mode will match Hello World. Using the same pattern in multiline mode will match Hello and World in two different Matchgroups.

In SingleLine, the String will be handled by the regex engine like

^Hello
World$

In MultiLine Mode, the Engine threads it like

^Hello$
^World$

Upvotes: 6

Related Questions