Tolu
Tolu

Reputation: 121

Split a sentence on capital letters

How can I split this?

'Symptoms may include:Absent or small knucklesCleft palateDecreased skin creases at finger jointsDeformed earsDroopy eyelidsInability to fully extend the joints from birth (contracture deformity)Narrow shouldersPale skinTriple-jointed thumbs'

Desired Output should take this form

Symptoms may include:
Absent or small knuckles
Cleft palate
Decreased skin creases at finger joints
Deformed ears
Droopy eyelids
Inability to fully extend the joints from birth (contracture deformity)
Narrow shoulders
Pale skin
Triple-jointed thumbs

Like split on Capital letters.

Upvotes: 1

Views: 397

Answers (3)

Mayukh Sarkar
Mayukh Sarkar

Reputation: 2625

I think the following code can be interesting

import re
output = re.sub( r"([A-Z])", r"\n\1", inputString)
print(output)

you can also store it back in list by splitting all the \n

outputList = output.split('\n')[1::]

This initially replaces all the capital letters with a \n and then the capital letter

Upvotes: -1

cs95
cs95

Reputation: 402533

Use re.findall (pattern improved thanks to @Brendan Abel and @JFF):

fragments = re.findall('[A-Z][^A-Z]*', text)

print(fragments)
['Symptoms may include:',
 'Absent or small knuckles',
 'Cleft palate',
 'Decreased skin creases at finger joints',
 'Deformed ears',
 'Droopy eyelids',
 'Inability to fully extend the joints from birth (contracture deformity)',
 'Narrow shoulders',
 'Pale skin',
 'Triple-jointed thumbs']

Details

[A-Z]      # match must begin with a uppercase char
[^A-Z]*    # further characters in match must not contain an uppercase char

Note: * lets you capture sentences with a single upper-case character. Substitute with + if that is not desired functionality.

Also, if you want your output as a multiline string:

print('\n'.join(fragments))

Upvotes: 6

John1024
John1024

Reputation: 113844

>>> s = 'Symptoms may include:Absent or small knucklesCleft palateDecreased skin creases at finger jointsDeformed earsDroopy eyelidsInability to fully extend the joints from birth (contracture deformity)Narrow shouldersPale skinTriple-jointed thumbs'
>>> print(''.join(('\n' + c if c.isupper() else c) for c in s)[1:])
Symptoms may include:
Absent or small knuckles
Cleft palate
Decreased skin creases at finger joints
Deformed ears
Droopy eyelids
Inability to fully extend the joints from birth (contracture deformity)
Narrow shoulders
Pale skin
Triple-jointed thumbs

How it works

  • (('\n' + c if c.isupper() else c) for c in s)

    The above generates a list of each character c in string s except if c is upper case in which case it prepends a new line to that character.

  • ''.join(('\n' + c if c.isupper() else c) for c in s))

    This joins the list back together into a string.

  • ''.join(('\n' + c if c.isupper() else c) for c in s)[1:]

    This removes the extra newline from the beginning of the string.

Upvotes: 2

Related Questions