Reputation: 121
How can I split this?
'Symptoms may include:Absent or small knucklesCleft palateDecreased skin creases at finger jointsDeformed earsDroopy eyelidsInability to fully extend the joints from birth (contracture deformity)Narrow shouldersPale skinTriple-jointed thumbs'
Desired Output should take this form
Symptoms may include:
Absent or small knuckles
Cleft palate
Decreased skin creases at finger joints
Deformed ears
Droopy eyelids
Inability to fully extend the joints from birth (contracture deformity)
Narrow shoulders
Pale skin
Triple-jointed thumbs
Like split on Capital letters.
Upvotes: 1
Views: 397
Reputation: 2625
I think the following code can be interesting
import re
output = re.sub( r"([A-Z])", r"\n\1", inputString)
print(output)
you can also store it back in list by splitting all the \n
outputList = output.split('\n')[1::]
This initially replaces all the capital letters with a \n
and then the capital letter
Upvotes: -1
Reputation: 402533
Use re.findall
(pattern improved thanks to @Brendan Abel and @JFF):
fragments = re.findall('[A-Z][^A-Z]*', text)
print(fragments)
['Symptoms may include:',
'Absent or small knuckles',
'Cleft palate',
'Decreased skin creases at finger joints',
'Deformed ears',
'Droopy eyelids',
'Inability to fully extend the joints from birth (contracture deformity)',
'Narrow shoulders',
'Pale skin',
'Triple-jointed thumbs']
Details
[A-Z] # match must begin with a uppercase char
[^A-Z]* # further characters in match must not contain an uppercase char
Note: *
lets you capture sentences with a single upper-case character. Substitute with +
if that is not desired functionality.
Also, if you want your output as a multiline string:
print('\n'.join(fragments))
Upvotes: 6
Reputation: 113844
>>> s = 'Symptoms may include:Absent or small knucklesCleft palateDecreased skin creases at finger jointsDeformed earsDroopy eyelidsInability to fully extend the joints from birth (contracture deformity)Narrow shouldersPale skinTriple-jointed thumbs'
>>> print(''.join(('\n' + c if c.isupper() else c) for c in s)[1:])
Symptoms may include:
Absent or small knuckles
Cleft palate
Decreased skin creases at finger joints
Deformed ears
Droopy eyelids
Inability to fully extend the joints from birth (contracture deformity)
Narrow shoulders
Pale skin
Triple-jointed thumbs
(('\n' + c if c.isupper() else c) for c in s)
The above generates a list of each character c
in string s
except if c
is upper case in which case it prepends a new line to that character.
''.join(('\n' + c if c.isupper() else c) for c in s))
This joins the list back together into a string.
''.join(('\n' + c if c.isupper() else c) for c in s)[1:]
This removes the extra newline from the beginning of the string.
Upvotes: 2