Reputation: 388
I've seen many solutions using re.split
but it doesn't solve my problem. I want to be able to split my string and keep some of the characters within the list... Hard to explain but here is an example:
Text:
'print("hello world");'
the result I want:
["print", "(", "\"", "hello", "world", "\"", ")", ";"]
Things like re.split would give me:
["print", "hello", "world"]
How can I get the wanted result?
Upvotes: 2
Views: 118
Reputation: 20669
You can try this.
import re
text='print("hello world");'
parsed=re.findall(r'(\w+|[^a-zA-Z\s])',text)
print(parsed)
#['print', '(', '"', 'hello', 'world', '"', ')', ';']
\w+
- To capture every word.
[^a-zA-Z\s]
- To capture everything not in [a-zA-Z]
and which is not a space.
EDIT: When you want to capture numbers and floats use this re
expression \d+\.\d+|\d+|\w+|[^a-zA-Z\s]
\d+
- To capture numbers
\d+\.\d+
- To capture floats.
a='print("hello world",[1,2,3,4,3.15]);'
print(re.findall('\d+\.\d+|\d+|\w+|[^a-zA-Z\s]',a)
#['print', '(', '"', 'hello', 'world', '"', ',', '[', '1', ',', '2', ',', '3', ',', '4', ',', '3.15', ']', ')', ';']
Upvotes: 6
Reputation: 1805
Try this:
import re
re.findall(r"[A-Za-z@#]+|\S", 'print("hello world");')
Out[19]: ['print', '(', '"', 'hello', 'world', '"', ')', ';']
Upvotes: 3