MorganFreeFarm
MorganFreeFarm

Reputation: 3733

Python split by multiple separators, including space?

Input:

Some Text here: Java, PHP, JS, HTML 5, CSS, Web, C#, SQL, databases, AJAX, etc.

Code:

import re

input_words = list(re.split('\s+', input()))
print(input_words)

Works perfect and returns me:

['Some', 'Text', 'here:', 'Java,', 'PHP,', 'JS,', 'HTML', '5,', 'CSS,', 'Web,', 'C#,', 'SQL,', 'databases,', 'AJAX,', 'etc.']

But when add some other separators, like this:

import re

input_words = list(re.split('\s+ , ; : . ! ( ) " \' \ / [ ] ', input()))
print(input_words)

It doesn't split by spaces anymore, where am I wrong?

Expected outpus would be:

['Some', 'Text', 'here', 'Java', 'PHP', 'JS', 'HTML', '5', 'CSS', 'Web', 'C#', 'SQL', 'databases', 'AJAX', 'etc']

Upvotes: 2

Views: 139

Answers (3)

Asif
Asif

Reputation: 237

write the expression inside brackets as shown below. Hope it helps

import re



input_words = list(re.split('[\s+,:.!()]', input()))

Upvotes: 1

Nagaraju
Nagaraju

Reputation: 1875

Word tokenization using nltk module

#!/usr/bin/python3
import nltk

sentence = """At eight o'clock on Thursday morning
... Arthur didn't feel very good."""
words = nltk.tokenize.word_tokenize(sentence)
print(words)

output:

['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', '...', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521457

You should be splitting on a regex alternation containing all those symbols:

input_words = re.split('[\s,;:.!()"\'\\\[\]]', input())
print(input_words)

This is a literal answer to your question. The actual solution you might want to use would be to split on the symbols with optional whitespace on either end, e.g

input = "A B ; C.D   ! E[F] G"
input_words = re.split('\s*[,;:.!()"\'\\\[\]]?\s*', input)
print(input_words)

Prints:

['A', 'B', 'C', 'D', 'E', 'F', 'G']

Upvotes: 6

Related Questions