Reputation: 1650
I am learning python and did the following experiement.
text = "this is line one . this is line two . this is line three ."
tokens = text.split(" ") # split text into token with seperator "space"
lioftokens = tokens.split(".") # split tokens into list of tokens with seperator "dot"
print(tokens) # output = ['this', 'is', 'line', 'one', '.', 'this', 'is', 'line', 'two', '.', 'this', 'is', 'line', 'three', '.']
print(lioftokens) # expected output = [['this', 'is', 'line', 'one', '.'],
# ['this', 'is', 'line', 'two', '.'],
# ['this', 'is', 'line', 'three', '.']]
It gives error instead of expected output.
The split()
is for string, not for list.
How should I solve it?
#IamNewToPython
Upvotes: 1
Views: 246
Reputation: 71580
Try using a list
comprehension:
text = "this is line one . this is line two . this is line three ."
print([line.rstrip().split() for line in text.split('.') if line])
Output:
[['this', 'is', 'line', 'one'], ['this', 'is', 'line', 'two'], ['this', 'is', 'line', 'three']]
If you want to keep the splitters try:
import re
text = "this is line one . this is line two . this is line three ."
print([line.rstrip().split() for line in re.split('([^\.]*\.)', text) if line])
Output:
[['this', 'is', 'line', 'one', '.'], ['this', 'is', 'line', 'two', '.'], ['this', 'is', 'line', 'three', '.']]
Edit:
If you want to do list split try:
l = ['this', 'is', 'line', 'one', '.', 'this', 'is', 'line', 'two', '.', 'this', 'is', 'line', 'three', '.']
newl = [[]]
for i in l:
newl[-1].append(i)
if i == '.':
newl.append([])
print(newl)
Output:
[['this', 'is', 'line', 'one', '.'], ['this', 'is', 'line', 'two', '.'], ['this', 'is', 'line', 'three', '.'], []]
Upvotes: 2
Reputation: 2609
str.split() method.
text = "this is line one . this is line two . this is line three ."
print([text.split()[i:i+5] for i in range(0,len(text.split()),5) ])
Upvotes: 0
Reputation: 447
text = "this is line one . this is line two . this is line three ."
# first split on the periods
sentences = text.split('.')
for s in sentences:
# chop off trailing whitespace and then split on spaces
print(s.rstrip().split())
Upvotes: 0
Reputation: 8564
This works:
>>> text = "this is line one . this is line two . this is line three ."
>>> list(filter(None, map(str.split, text.split("."))))
[['this', 'is', 'line', 'one'],
['this', 'is', 'line', 'two'],
['this', 'is', 'line', 'three']]
You can simply split the list by .
first, then simply map
a str.split
to each individual string inside the list.
Upvotes: 0