Reputation: 131
I'd like to know if a text contains words with 4 characters or more which are repeated 3 or more times in the text (anywhere in the text). If so, set one (and only one) backreference for each word.
I tried the code
(?=\b(\w{4,}+)\b.*\1)
Results returns
Test 10/39: Not working, sorry. Read the task description again. It matches notword word word
Tried
(?=(\b\w{4,}\b)(?:.*\b\1\b){2,})
Test 22/39: If a certain word is repeated many times, you're setting more than 1 backreference (common mistake, I know). You don't necessarily need to match the first occurrence of the word. Can you avoid a match in >word< word word word, and match word >word< word word? (Hint: match if it's followed by 2 occurences, don't match if it's followed by 3)
Regex demo
Upvotes: 2
Views: 2167
Reputation: 147146
If I understand your question correctly, this should do what you want:
(?=(\b\w{4,}\b)(?:.*\b\1\b){2})(?!(\b\w{4,}\b)(?:.*\b\1\b){3})
It is essentially the same as your regex, looking for a word of 4 characters that is repeated, but it looks for 2 extra occurrences (so it appears 3 times). The words which match will be captured in group 1. The regex includes a negative lookahead for 3 repeats, so that it won't match the same word twice if it occurs 4 or more times.
Upvotes: 8