alex
alex

Reputation: 95

matching the second set of characters anywhere on the line?

hi i want to match the second appearance of a string, but just to match that, and not anything else, so in my example, the word1 within square brackets means should be the matched one.

word1 tex text [word1] text
word1 [word1] word1
word1 [word1]
word1 text [word1]

please can you help me, i am learning regular expressions and i can't find an answer in the internet nor in the books. i am using a notepad editor that accepts .net regex compatible or perl.

thank you

Upvotes: 1

Views: 171

Answers (2)

jpmc26
jpmc26

Reputation: 29866

It depends on what you want to do with the match.

If you just want to get a boolean as to whether you have a match or not, it's straightforward to simply see if you have the word twice:

word1.*word1

The .* matches any number of any characters, so it just looks for the word twice with anything between.

If you want to replace it, keep it simple. Just replace everything:

word1(.*?)word1

For replacement, you need to add the ?. The ? keeps the * from being greedy, so it won't eat the second occurrence of word1 when there's three.

Replace the match with

word1\1newword

The \1 represents everything inside the parentheses (i.e., everything picked up by .*?). The \ may vary depending on what regex engine you're using. For example, Powershell (and I think .NET) uses $ instead of \.

Basically, think about what you want to do with the result. Ask yourself if there's a bigger string you can match instead of just the second occurrence.

Upvotes: 1

Bergi
Bergi

Reputation: 664297

To match only something that appears behind something else you will need to use lookbehind. To match something that occurs twice you can use a matching group and a backreference:

/(?<=\b\1\b.*?)\b(\w+)\b/

However, complexity of lookbehind is limited in most languages so I'm not sure this is valid.

Upvotes: 1

Related Questions