cxs101
cxs101

Reputation: 23

Python Regex to detect underscore between letters

How do I make a regex in python that returns a string with all underscores between lowercase letters? For example, it should detect and return: 'aa_bb_cc' , 'swd_qq' , 'hello_there_friend'

But it should not return these: 'aA_bb' , 'aa_' , '_ddQ' , 'aa_baa_2cs'

My code is ([a-z]+_[a-z]+)+ , but it returns only one underscore. It should return all underscores seperated by lowercase letters.

For example, when I pass the string "aab_cbbbc_vv", it returns only 'aab_cbbbc' instead of 'aab_cbbbc_vv'

Thank you

Upvotes: 2

Views: 1380

Answers (3)

The fourth bird
The fourth bird

Reputation: 163362

The reason that you get only results with 1 underscore for your example data is that ([a-z]+_[a-z]+)+ repeats a match of [a-z]+, then an underscore and then again [a-z]+

That would for example match a_b or a_bc_d, but only a partial match for a_b_c as there has to be at least a char a-z present before each _ for every iteration.

You could update your pattern to:

\b[a-z]+(?:_[a-z]+)+\b

Explanation

  • \b A word boundary
  • [a-z]+ Match 1+ chars a-z
  • (?:_[a-z]+)+ Repeat 1+ times matching _ and 1+ chars a-z
  • \b A word boundary

regex demo

Upvotes: 1

jonathan
jonathan

Reputation: 269

try this code to get it

import re
s = "aa_bb_cc swd_qq hello_there_friend aA_bb aa_ _ddQ aa_baa_2cs"
print(re.findall(r"[a-z][a-z_]+\_[a-z]+",s))

the output sould be

['aa_bb_cc', 'swd_qq', 'hello_there_friend', 'aa_baa']

Upvotes: 1

sophros
sophros

Reputation: 16660

Your regex is almost correct. If you change it to:

^([a-z]+)(_[a-z]+)+$

It woks as you can check here.

^ - matches the beginning of the string

$ - the end of the string

You need these so that you are not getting partial matches when matching the strings you don't want to get matched.

Upvotes: 3

Related Questions