Bill Lefler
Bill Lefler

Reputation: 127

How do I find two instances on a single line with a Regular Expression?

I'm trying to use Regular Expressions to Find and Replace some text in a folder of documents. My problem is that when the text appears twice in the same line, the Regular Expression "finds" the match by running the two expressions together from the start of the first to the end of the second.

Here is my attempt at the regular expression:

\\x.*\\x\*

The text I am trying to match starts with \x and ends with \x*.

This first example contains one match:

2Y Sara concibió \x a \xo 21.2: \xt Heb 11.11.\x* y le dio un hijo a Abrahán en su vejez, en el tiempo preciso que Dios le había anunciado.

This second example contains two matches, but they are run together by the regular expression:

2Los creó hombre y mujer,\x a \xo 5.2: \xt Mt 19.4; Mc 10.6.\x* y los bendijo.\x b \xo 5.1-2: \xt Gn 1.27-28.\x* El día en que fueron creados les puso por nombre Adán.

I've never become proficient at regular expressions because of frustrations like these... always sounds like a great idea. But I'm trying to learn!

Upvotes: 0

Views: 155

Answers (1)

Jeff Bowman
Jeff Bowman

Reputation: 95654

Your .* matches asterisks as well. By default regular expressions are greedy: They match as many characters as they can, and then proceed backwards until the match is successful.

One option is to follow the .* with a question mark, creating .*?. This syntax insists that the match be non-greedy, so it matches the fewest characters possible. Syntax with a trailing ? is compatible in Perl, ECMA, Java, and most other implementations aside from POSIX/GNU implementations. ie like this:

\\x.*?\\x\*

Your alternative is to match all characters except asterisk, which in regular expressions looks like [^*]*. This, however, will prevent you from matching any asterisks in the character string, even those not preceded by backslashes.

(Thank you lxop for noting the errata!)

Upvotes: 2

Related Questions