humberthumbert116
humberthumbert116

Reputation: 1

In Python, how to take in a string text, and returns a list which contains lists of strings?

This function takes in a string text, and returns a list which contains lists of strings, one list for each sentence in the string text.

Sentences are separated by one of the strings ".", "?", or "!". We ignore the possibility of other punctuation separating sentences. so 'Mr.X' will turn to 2 sentences, and 'don't' will be two words.

For example, the text is

Hello, Jack.  How is it going?  Not bad; pretty good, actually...  Very very
good, in fact.

And the function returns:

 ['hello', 'jack'],
 ['how', 'is', 'it', 'going'],
 ['not', 'bad', 'pretty', 'good', 'actually'],
 ['very', 'very', 'good', 'in', 'fact']]

The most confusing part is how to make the function detect the characters , . ! ? and how to make it a list of lists contains words in each sentence. Thank you.

Upvotes: 0

Views: 435

Answers (2)

lukevp
lukevp

Reputation: 715

This sounds very much like a homework problem to me, so I'll provide general tips instead of exact code.

a string has the split(char) function on it. You can use this to split your string based on a specific character. However, you will have to use a loop and perform the split multiple times.

You could also use a regular expression to find matches (that would be a better solution.) That would let you find all matches at once. Then you would iterate over the matches and spit them based on spaces, while stripping out punctuation.

Edit: Here's an example of a regular expression you could use to get sentence groups all at once:

\s*([^.?!]+)\s*

The \s* surrounding the parenthesis causes any extra spaces to be removed from the result, and the parenthesis are a capture group. You can use re.findall() to get a list of all captured results, and then you can loop over these items and use re.split() and some conditional logic to append all the words to a new list.

Let me know how you get along with that, and if you have any other questions please provide us the code you have so far.

Upvotes: 4

Kasravnd
Kasravnd

Reputation: 107287

you can use re.split() :

>>> s="Hello, Jack.  How is it going?  Not bad; pretty good, actually...  Very very good, in fact."
>>> import re
>>> [re.split(r'\W',i) for i in re.split(r'\.|\?|\!',s) if len(i)]

and for remove empty indices you can do this :

>>> [[x for x in i if len(x)]for i in my_s]
[['Hello', 'Jack'], ['How', 'is', 'it', 'going'], ['Not', 'bad', 'pretty', 'good', 'actually'], ['Very', 'very', 'good', 'in', 'fact']]

Upvotes: 1

Related Questions