Reputation: 133
All -- need a regex that matches an entire word which contains a character repeated multiple times. For example, given the sentence "here areee some testtting words" I want to match on "areee" and "testtting".
A pattern like "([a-z])\1{1,}" matches repeated characters, but it returns "eee" and "ttt" as opposed to the entire word containing the repeated characters. I've experimented with many variations using "\w" for words and "\b" for word borders, but can't seem to make it work...thanks!
Upvotes: 0
Views: 7381
Reputation: 626728
You can use
\b(?=\w*(\w)\1)\w+\b
See the regex demo
A bit more enhanced version without a lookahead (similar to what Federico Piazzi suggests in the comment below) will look like
\b\w*(\w)\1\w*\b
See another regex demo. There is no need setting a quantifier to the backreference \1
as even two repeated consecutive characters already entitle the word for matching.
Pattern details:
\b
- a leading word boundary(?=\w*(\w)\1)
- a positive lookahead that will require at least 1 repeated word character (the \w*
will match 0+ word characters, (\w)
will match and capture into Group 1 a word character and \1
will match the same character captured into Group 1) in the word that will be matched with...\w+
- 1+ word characters\b
- trailing word boundaryR code demo for word with repeated consecutive letters extraction:
> library(stringr)
> text = "here areee some testtting words"
> str_extract_all(text, "\\b(?=\\w*(\\w)\\1)\\w+\\b")
[[1]]
[1] "areee" "testtting"
And a demo for these words removing:
> gsub("\\s*\\b(?=\\w*(\\w)\\1)\\w+\\b", text, replacement = " ", perl = TRUE)
[1] "here some words"
See the \\s*
added at the start of the pattern to also trim the whitespaces (if any) before the word to remove. If you need to also get rid of the inital whitespace that appears after removing the first word, use trimws()
.
NOTE: If you plan to only check for repeated letters, use \b(?=\w*([a-zA-Z])\1)\w+\b
Upvotes: 7
Reputation:
All you needed to do was pad it with the necessary gibberish
^[a-z]*([a-z])\1{1}[a-z]*$
As said in the comments, {1} is never necessary. That makes:
^[a-z]*([a-z])\1[a-z]*$
View a demo here
https://regex101.com/r/dT6dK8/1
remove the ^
and $
if you want it to work inline
Upvotes: 3