Reputation: 319
I have a string in which I want to make a regular expression in python to find three character repetend words who's first and last character should be same and middle one can any character
Sample string
s = 'timtimdsikmunmunju
ityakbonbonjdjjdkitkat
ghdnjsamsun
ksuwjkhokhojeuhjjimjam
jsju'
I want to extract all the highlighted words from above string...
My solution, but not matching with my requirement
import re
s='timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'
re.findall(r'([a-z].[a-z])(\1)',s)
this is giving me this
[('tim', 'tim'), ('mun', 'mun'), ('bon', 'bon'), ('kho', 'kho')]
I want this
[('kit', 'kat'), ('sam', 'sun'), ('jim', 'jam'),('nmu', 'nju')]
Thanks
Upvotes: 1
Views: 460
Reputation: 260975
You can use capturing groups and references:
s='timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'
import re
out = re.findall(r'((.).(.)\2.\3)', s)
[e[0] for e in out]
output:
['timtim', 'munmun', 'bonbon', 'kitkat', 'khokho', 'jimjam']
[e[0] for e in re.findall(r'((.)(.)(.)\2(?!\3).\4)', s)]
output:
['nmunju', 'kitkat', 'jimjam']
>>> [(e[0][:3], e[0][3:]) for e in re.findall(r'((.)(.)(.)\2(?!\3).\4)', s)]
[('nmu', 'nju'), ('kit', 'kat'), ('jim', 'jam')]
Upvotes: 6
Reputation: 785286
You can use this regex in python:
(?P<first>([a-z])(.)([a-z]))(?P<second>\2(?!\3).\4)
Group first
is for first word and second
is for the second word.
(?!\3)
is negative lookahead to make sure second character is not same in 2nd word.
import re
rx = re.compile(r"(?P<first>([a-z])(.)([a-z]))(?P<second>\2(?!\3).\4)")
s = 'timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'
for m in rx.finditer(s): print(m.group('first'), m.group('second'))
Output:
nmu nju
kit kat
jim jam
Upvotes: 2
Reputation: 184
You can do it faster with for loop:
result2 = []
for i in range(len(s)):
try:
if s[i] == s[i+3] and s[i+2] == s[i+5]:
result2.append((s[i:i+3], s[i+3:i+6]))
except IndexError:pass
print(result2)
Upvotes: 1
Reputation: 10545
There is always the pure Python way:
s = 'timtimdsikmunmunjuityakbonbonjdjjdkitkatghdnjsamsunksuwjkhokhojeuhjjimjamjsju'
result = []
for i in range(len(s) - 5):
word = s[i:(i+6)]
if (word[0] == word[3] and word[2] == word[5] and word[1] != word[4]):
result.append(word)
print(result)
['nmunju', 'kitkat', 'jimjam']
Upvotes: 3