Reputation: 4083
I have a word list like below. I want to split the list by .
. Is there any better or useful code in Python 3?
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
result = []
tmp = []
for elm in a:
if elm is not '.':
tmp.append(elm)
else:
result.append(tmp)
tmp = []
print(result)
# result: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
Add test cases to handle it correctly.
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
def split_list(list_data, split_word='.'):
result = []
sub_data = []
for elm in list_data:
if elm is not split_word:
sub_data.append(elm)
else:
if len(sub_data) != 0:
result.append(sub_data)
sub_data = []
if len(sub_data) != 0:
result.append(sub_data)
return result
print(split_list(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
print(split_list(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
print(split_list(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
Upvotes: 27
Views: 10061
Reputation: 3547
Using itertools.groupby
:
from itertools import groupby
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
result = [
list(g)
for k,g in groupby(a,lambda x:x=='.')
if not k
]
print (result)
#[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
Upvotes: 27
Reputation: 153460
You can do this all with a "one-liner" using list comprehension and string functions join
, split
, strip
, and no additional libraries.
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
In [5]: [i.strip().split(' ') for i in ' '.join(a).split('.') if len(i) > 0 ]
Out[5]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
In [8]: [i.strip().split(' ') for i in ' '.join(b).split('.') if len(i) > 0 ]
Out[8]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
In [9]: In [8]: [i.strip().split(' ') for i in ' '.join(c).split('.') if len(i) > 0 ]
Out[9]: [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
[s.split() for s in ' '.join(a).split('.') if s]
Upvotes: 13
Reputation: 152647
This answer requires installing a 3rd party library: iteration_utilities
1. The included split
function makes solving this task straightforward:
>>> from iteration_utilities import split
>>> a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
>>> list(filter(None, split(a, '.', eq=True)))
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
Instead of using the eq
parameter you can also define a custom function where to split:
>>> list(filter(None, split(a, lambda x: x=='.')))
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
In case you want to keep the '.'
s you could also use the keep_before
argument:
>>> list(filter(None, split(a, '.', eq=True, keep_before=True)))
[['this', 'is', 'a', 'cat', '.'], ['hello', '.'], ['she', 'is', 'nice', '.']]
Note that the library just makes it easier - it's easily possible (see the other answers) to accomplish this task without installing an additional library.
The filter
can be removed if you don't expect '.'
to appear at the beginning or end of your to-be-split list.
1 I'm the author of that library. It's available via pip
or the conda-forge
channel with conda
.
Upvotes: 1
Reputation: 26315
I couldn't help myself, just wanted to have fun with this great question:
import itertools
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
b = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
c = ['.', 'this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.', 'yes']
def split_dots(lst):
dots = [0] + [i+1 for i, e in enumerate(lst) if e == '.']
result = [list(itertools.takewhile(lambda x : x != '.', lst[dot:])) for dot in dots]
return list(filter(lambda x : x, result))
print(split_dots(a)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
print(split_dots(b)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
print(split_dots(c)) # [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice'], ['yes']]
Upvotes: 1
Reputation: 236004
Here's another way using only standard list operations (with no dependencies on other libraries!). First we find the split points and then we create sublists around them; notice that the first element is treated as a special case:
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
indexes = [-1] + [i for i, x in enumerate(a) if x == '.']
[a[indexes[i]+1:indexes[i+1]] for i in range(len(indexes)-1)]
=> [['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
Upvotes: 7
Reputation: 71451
You can reconstruct the string using ' '.join
and use regex:
import re
a = ['this', 'is', 'a', 'cat', '.', 'hello', '.', 'she', 'is', 'nice', '.']
new_s = [b for b in [re.split('\s', i) for i in re.split('\s*\.\s*', ' '.join(a))] if all(b)]
Output:
[['this', 'is', 'a', 'cat'], ['hello'], ['she', 'is', 'nice']]
Upvotes: 3