Reputation: 23
How do I make a regex in python that returns a string with all underscores between lowercase letters?
For example, it should detect and return: 'aa_bb_cc' , 'swd_qq' , 'hello_there_friend'
But it should not return these: 'aA_bb' , 'aa_' , '_ddQ' , 'aa_baa_2cs'
My code is ([a-z]+_[a-z]+)+
, but it returns only one underscore. It should return all underscores seperated by lowercase letters.
For example, when I pass the string "aab_cbbbc_vv"
, it returns only 'aab_cbbbc'
instead of 'aab_cbbbc_vv'
Thank you
Upvotes: 2
Views: 1380
Reputation: 163362
The reason that you get only results with 1 underscore for your example data is that ([a-z]+_[a-z]+)+
repeats a match of [a-z]+, then an underscore and then again [a-z]+
That would for example match a_b
or a_bc_d
, but only a partial match for a_b_c
as there has to be at least a char a-z present before each _ for every iteration.
You could update your pattern to:
\b[a-z]+(?:_[a-z]+)+\b
Explanation
\b
A word boundary[a-z]+
Match 1+ chars a-z(?:_[a-z]+)+
Repeat 1+ times matching _
and 1+ chars a-z\b
A word boundaryUpvotes: 1
Reputation: 269
try this code to get it
import re
s = "aa_bb_cc swd_qq hello_there_friend aA_bb aa_ _ddQ aa_baa_2cs"
print(re.findall(r"[a-z][a-z_]+\_[a-z]+",s))
the output sould be
['aa_bb_cc', 'swd_qq', 'hello_there_friend', 'aa_baa']
Upvotes: 1
Reputation: 16660
Your regex is almost correct. If you change it to:
^([a-z]+)(_[a-z]+)+$
It woks as you can check here.
^
- matches the beginning of the string
$
- the end of the string
You need these so that you are not getting partial matches when matching the strings you don't want to get matched.
Upvotes: 3