Reputation: 5657
I'm trying tokenize some code and would like to keep the delimiter when I split a string.
For example, I would like to keep any occurences of .
, (
, )
, ;
, ~
.
I have been using re.split:
line = 'Keyboard.keyPressed();'
re.split(r'([\.\(\)\;\~])', line)
However, my current implementation of re.split currently creates some unnecessary empty strings in the array:
['Keyboard', '.', 'keyPressed', '(', '', ')', '', ';', '']
How can I fix this to exclude the empty strings?
Upvotes: 1
Views: 234
Reputation: 879
Keep your split simple for clarity and just remove empty strings.
import re
line = 'Keyboard.keyPressed();'
split = re.split(r'([\.\(\)\;\~])', line)
cleared = list(filter(None, split)) # <- Add this line
print(cleared)
<script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>
<div data-datacamp-exercise data-lang="python">
<code data-type="sample-code">
import re
line = 'Keyboard.keyPressed();'
split = re.split(r'([\.\(\)\;\~])', line)
cleared = list(filter(None, split))
print(cleared)
</code>
</div>
See: how to remove elements from a list
Upvotes: 1
Reputation: 627488
You may use
re.findall(r'[^.();~]+|[.();~]', line)
See the regex demo
The re.findall
will return a list of all matched non-overlapping substrings in the input string, matching the following:
[^.();~]+
- 1 or more (due to +
quantifier) chars other than .
, (
, )
, ;
and ~
|
- or[.();~]
- a single occurrence of .
, (
, )
, ;
or ~
.See Python demo online:
import re
line = 'Keyboard.keyPressed();'
print(re.findall(r'[^.();~]+|[.();~]', line))
# => ['Keyboard', '.', 'keyPressed', '(', ')', ';']
Upvotes: 0