Shaida Muhammad
Shaida Muhammad

Reputation: 1650

Python split function won't work on list to generate list of list

I am learning python and did the following experiement.

    text = "this is line one . this is line two . this is line three ."
    
    tokens = text.split(" ")            # split text into token with seperator "space"
    lioftokens = tokens.split(".")      # split tokens into list of tokens with seperator "dot"
    
    print(tokens)                       # output = ['this', 'is', 'line', 'one', '.', 'this', 'is', 'line', 'two', '.', 'this', 'is', 'line', 'three', '.']
    print(lioftokens)                   # expected output = [['this', 'is', 'line', 'one', '.'],
                                        #                    ['this', 'is', 'line', 'two', '.'],
                                        #                    ['this', 'is', 'line', 'three', '.']]

It gives error instead of expected output.

The split() is for string, not for list. How should I solve it?

#IamNewToPython

Upvotes: 1

Views: 246

Answers (4)

U13-Forward
U13-Forward

Reputation: 71580

Try using a list comprehension:

text = "this is line one . this is line two . this is line three ."
print([line.rstrip().split() for line in text.split('.') if line])

Output:

[['this', 'is', 'line', 'one'], ['this', 'is', 'line', 'two'], ['this', 'is', 'line', 'three']]

If you want to keep the splitters try:

import re
text = "this is line one . this is line two . this is line three ."
print([line.rstrip().split() for line in re.split('([^\.]*\.)', text) if line])

Output:

[['this', 'is', 'line', 'one', '.'], ['this', 'is', 'line', 'two', '.'], ['this', 'is', 'line', 'three', '.']]

Edit:

If you want to do list split try:

l = ['this', 'is', 'line', 'one', '.', 'this', 'is', 'line', 'two', '.', 'this', 'is', 'line', 'three', '.']
newl = [[]]
for i in l:
    newl[-1].append(i)
    if i == '.':
        newl.append([])
print(newl)

Output:

[['this', 'is', 'line', 'one', '.'], ['this', 'is', 'line', 'two', '.'], ['this', 'is', 'line', 'three', '.'], []]

Upvotes: 2

Samsul Islam
Samsul Islam

Reputation: 2609

str.split() method.

text = "this is line one . this is line two . this is line three ."

print([text.split()[i:i+5] for i in range(0,len(text.split()),5) ])

Upvotes: 0

cactus
cactus

Reputation: 447

text = "this is line one . this is line two . this is line three ."

# first split on the periods
sentences = text.split('.')

for s in sentences:
    # chop off trailing whitespace and then split on spaces
    print(s.rstrip().split())

Upvotes: 0

Jarvis
Jarvis

Reputation: 8564

This works:

>>> text = "this is line one . this is line two . this is line three ."
>>> list(filter(None, map(str.split, text.split("."))))
[['this', 'is', 'line', 'one'],
 ['this', 'is', 'line', 'two'],
 ['this', 'is', 'line', 'three']]

You can simply split the list by . first, then simply map a str.split to each individual string inside the list.

Upvotes: 0

Related Questions