hamidfzm
hamidfzm

Reputation: 4695

Split more than one word in python

How can I write a program in python that can split more than one word or character? For example I have these sentences: Hi, This is a test. Are you surprised? In this example i need my program to split these sentences by ',','!','?' and '.'. I know split in str library and NLTK but I need to know is there any internal pythonic way like split?

Upvotes: 1

Views: 204

Answers (4)

inspectorrr
inspectorrr

Reputation: 17

def get_words(s):
    l = []
    w = ''
    for c in s:
        if c in '-!?,. ':
            if w != '': 
                l.append(w)
            w = ''
        else:
            w = w + c
    if w != '': 
        l.append(w)
    return l



>>> s = "Hi, This is a test. Are you surprised?"
>>> print get_words(s)
['Hi', 'This', 'is', 'a', 'test', 'Are', 'you', 'surprised']


If you change '-!?,. ' into '-!?,.'
The output will be:
['Hi', ' This is a test', ' Are you surprised']

Upvotes: 0

hamidfzm
hamidfzm

Reputation: 4695

I think I found a tricky way for my question. I don't need to use any modules for that. I can use replace method of str library and replace words like ! or ? with . . Then I can use split method for my text to split word by . .

Upvotes: 1

Raydel Miranda
Raydel Miranda

Reputation: 14370

You are looking for the tokenize function of NLTK package. NLTK stands for Natural Language Tool Kit

Or try re.split from re module.

From re doc.

>>> re.split('\W+', 'Words, words, words.')
['Words', 'words', 'words', '']
>>> re.split('(\W+)', 'Words, words, words.')
['Words', ', ', 'words', ', ', 'words', '.', '']
>>> re.split('\W+', 'Words, words, words.', 1)
['Words', 'words, words.']
>>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE)
['0', '3', '9']

Upvotes: 1

Andrey Shokhin
Andrey Shokhin

Reputation: 12220

Use re.split:

string = 'Hi, This is a test. Are you surprised?'
words = re.split('[,!?.]', string)
print(words)
[u'Hi', u' This is a test', u' Are you surprised', u'']

Upvotes: 3

Related Questions