How to correctly match the start and end of any word with exactly the same sequence while having characters in between?

Question

I am using UNIX and Regex to match any word with the same sequence at start and end. Plus, it has to have one specific character (say 'm') followed by any number of characters in between.

For instances:

cahellomca is valid

salommoresa is valid

lonoletterlo not valid

homlastho is valid

Here is what I have been able to do. However, it returns zero for a list with the previous words (it should return 3).

% egrep -c '^([a-z]{2})(z{1}.*)\2\1$' list

The fourth bird · Accepted Answer

You can use a single capture group at the start.

Then match the char like m surrounded by optional chars [a-z]* and use the backreference at the end of the string.

^([a-z]+)[a-z]*m[a-z]*\1$

Regex demo

Example

egrep -c '^([a-z]+)[a-z]*m[a-z]*\1$' list

Output

Or as suggested by @anubhava

grep -cE '^([a-z]+)[a-z]*m[a-z]*\1$' list

How to correctly match the start and end of any word with exactly the same sequence while having characters in between?

Answers (1)

Related Questions