rahlf23
rahlf23

Reputation: 9019

Python split list into multiple shorter lists when reaching a capitalized word

I want to be able to split a list of items when reaching a capitalized word, for example:

Input:

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']

Output:

['HARRIS', 'second', 'caught']
['JONES', 'third']
['Smith', 'stole', 'third']

Would it be best to approach this problem using s.index('some regex') and then split the list accordingly at those given indices?

Upvotes: 1

Views: 119

Answers (4)

ZRTSIM
ZRTSIM

Reputation: 75

str.istitle("Abc") #True
str.istitle("ABC") #False
str.istitle("ABc") #False

str.isupper("Abc") #False
str.isupper("ABC") #True
str.isupper("ABc") #False

So I think it will help you Checking if first letter of string is in uppercase

a = "Abc"
print(str.isupper(a[0]))

or

a = "Abc"
print(a[0].isupper())

Upvotes: 0

Chris
Chris

Reputation: 22963

If your willing to use a third-party library, you can use iteration_utilities.Iterable to easily accomplish this:

>>> from iteration_utilities import Iterable
>>> 
>>> lst = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']
>>> Iterable(lst).split(str.isupper, keep_after=True).filter(lambda l: l).as_list()
[['HARRIS', 'second', 'caught'], ['JONES', 'third', 'Smith', 'stole', 'third']]

Upvotes: 1

delta
delta

Reputation: 3818

A straight forward way is to enumerate the list, when founding a Capital, we start a new list, otherwise append.

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third', 'H']

def split_by(lst, p):
    lsts = []
    for x in lst:
        if p(x):
            lsts.append([x])
        else:
            lsts[-1].append(x)
    return lsts

print(split_by(s, str.isupper))

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71461

You can try this:

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']

indices = [i for i, a in enumerate(s) if a[0].isupper()]

indices.append(len(s))

final_list = [s[indices[i]:indices[i+1]] for i in range(len(indices)-1)]

Output:

[['HARRIS', 'second', 'caught'], ['JONES', 'third'], ['Smith', 'stole', 'third']]

Note that this solution only works when the first letter in a certain element is uppercase.

If you want a solution where any letter can be capitalized:

s = ['HARRIS', 'second', 'caught', 'JONES', 'third', 'Smith', 'stole', 'third']

indices = [i for i, a in enumerate(s) if any(b.isupper() for b in a)]

indices.append(len(s))

final_list = [s[indices[i]:indices[i+1]] for i in range(len(indices)-1)]

Upvotes: 2

Related Questions