Simple Regex Python

Question

I am reading in a line from a file and want to split words that are delimited by nonalphanumeric ascii characters or a break statement using re.split but I am having trouble determining how to create the correct pattern. The below code yields:

split = re.split(r'(
)|(\W+)', 'I code
A project.')
split = ['', None, 'I', '', None, 'code', '', None, '<', '', None, 'br',
         '',None, '>', '', None, 'A', '', None, 'project.']

I believed I would be able to recognize a break statement or a nonascii character usig the pattern above but clearly it is incorrect. I am having trouble understanding Regex, any help fixing this would be appreciated. I would like it look like the below after split properly:

split = ['I', 'code', 'A', 'project']

Simple Regex Python

Answers (1)

Related Questions