doctopus
doctopus

Reputation: 5657

Splitting a string but keeping the delimiter

I'm trying tokenize some code and would like to keep the delimiter when I split a string.

For example, I would like to keep any occurences of ., (, ), ;, ~.

I have been using re.split:

line = 'Keyboard.keyPressed();'
re.split(r'([\.\(\)\;\~])', line)

However, my current implementation of re.split currently creates some unnecessary empty strings in the array:

['Keyboard', '.', 'keyPressed', '(', '', ')', '', ';', '']

How can I fix this to exclude the empty strings?

Upvotes: 1

Views: 234

Answers (2)

Florian Fasmeyer
Florian Fasmeyer

Reputation: 879

Keep your split simple for clarity and just remove empty strings.

import re
line = 'Keyboard.keyPressed();'
split = re.split(r'([\.\(\)\;\~])', line)

cleared = list(filter(None, split))   # <- Add this line

print(cleared)

<script type="text/javascript" src="//cdn.datacamp.com/dcl-react.js.gz"></script>

<div data-datacamp-exercise data-lang="python">
  <code data-type="sample-code">
import re
line = 'Keyboard.keyPressed();'
split = re.split(r'([\.\(\)\;\~])', line)
cleared = list(filter(None, split))
print(cleared)
  </code>
</div>

See: how to remove elements from a list

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627488

You may use

re.findall(r'[^.();~]+|[.();~]', line)

See the regex demo

The re.findall will return a list of all matched non-overlapping substrings in the input string, matching the following:

  • [^.();~]+ - 1 or more (due to + quantifier) chars other than ., (, ), ; and ~
  • | - or
  • [.();~] - a single occurrence of ., (, ), ; or ~.

See Python demo online:

import re
line = 'Keyboard.keyPressed();'
print(re.findall(r'[^.();~]+|[.();~]', line))
# => ['Keyboard', '.', 'keyPressed', '(', ')', ';']

Upvotes: 0

Related Questions