Zach Gittelman
Zach Gittelman

Reputation: 31

Simple Regex Python

I am reading in a line from a file and want to split words that are delimited by nonalphanumeric ascii characters or a break statement using re.split but I am having trouble determining how to create the correct pattern. The below code yields:

split = re.split(r'(<br>)|(\W+)', 'I code<br>A project.')
split = ['', None, 'I', '', None, 'code', '', None, '<', '', None, 'br',
         '',None, '>', '', None, 'A', '', None, 'project.']

I believed I would be able to recognize a break statement or a nonascii character usig the pattern above but clearly it is incorrect. I am having trouble understanding Regex, any help fixing this would be appreciated. I would like it look like the below after split properly:

split = ['I', 'code', 'A', 'project']

Upvotes: 3

Views: 54

Answers (1)

Mark
Mark

Reputation: 108507

You don't need the group syntax ():

>>> re.split(r'<br>|\W+', 'I code<br>A    project.')
['I', 'code', 'A', 'project', '']

Upvotes: 1

Related Questions