Valeria S.
Valeria S.

Reputation: 348

Extracting all uppercase words following each other from string

Out of a string like this "A B c de F G A" I would like to get the following list: ["A B", "F G A"]. That means, I need to get all the sequences of uppercase words.

I tried something like this:

text = "A B c de F G A"
result = []
for i, word in enumerate(text.split()):
    if word[0].isupper():
        s = ""
        while word[0].isupper():

            s += word
            i += 1
            word = text[i]

        result.append(s)

But it produces a the following output: ['A', 'BB', 'F', 'G', 'A']

I suppose it happens because you can't skip a list element by just incrementing i. How can I avoid this situation and get the right output?

Upvotes: 0

Views: 4081

Answers (4)

Veera Samantula
Veera Samantula

Reputation: 165

The following example will extract all uppercase words following each other from a string:

string="A B c de F G A"

import re
[val for val in re.split('[a-z]*',string.strip()) if val != " "]

Upvotes: 0

Olivier Melançon
Olivier Melançon

Reputation: 22314

You can use re.split to split a string with a regex.

import re

def get_upper_sequences(s):
    return re.split(r'\s+[a-z][a-z\s]*', s)

Example

>>> get_upper_sequences( "A B c de F G A")
['A B', 'F G A']

Upvotes: 1

pault
pault

Reputation: 43504

Here is solution without itertools or re:

def findTitles(text):
    filtered = " ".join([x if x.istitle() else " " for x in text.split()])
    return [y.strip() for y in filtered.split("  ") if y]

print(findTitles(text="A B c de F G A"))
#['A B', 'F G A']

print(findTitles(text="A Bbb c de F G A"))
#['A Bbb', 'F G A']

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can use itertools.groupby:

import itertools
s = "A B c de F G A"
new_s = [' '.join(b) for a, b in itertools.groupby(s.split(), key=str.isupper) if a]

Output:

['A B', 'F G A']

Upvotes: 7

Related Questions