Reputation: 3
For my homework I have tried to get the first word of each sentence to capitalize.
This is for python 3.7.
def fix_cap():
if "." in initialInput:
sentsplit = initialInput.split(". ")
capsent = [x.capitalize() for x in sentsplit]
joinsent = ". ".join(capsent)
print("Number of words capitalized: " + str(len(sentsplit)))
print("Edited text: " + joinsent)
elif "!" in initialInput:
sentsplit = initialInput.split("! ")
capsent = [x.capitalize() for x in sentsplit]
joinsent = "! ".join(capsent)
print("Number of words capitalized: " + str(len(sentsplit)))
print("Edited text: " + joinsent)
elif "?" in initialInput:
sentsplit = initialInput.split("? ")
capsent = [x.capitalize() for x in sentsplit]
joinsent = "? ".join(capsent)
print("Number of words capitalized: " + str(len(sentsplit)))
print("Edited text: " + joinsent)
else:
print(initialInput.capitalize())
This will work if only one type of punctuation is used, but I would like it to work with multiple types in a paragraph.
Upvotes: 0
Views: 98
Reputation: 82919
Correctly splitting a text into sentences is hard. For how to do this correctly also for cases like e.g. abbreviations, names with titles etc., please refer to other questions on this site, e.g. this one. This is only a very simple version, based on your conditions, which, I assume, will suffice for your task.
As you noticed, your code only works for one type of punctuation, because of the if/elif/else
construct. But you do not need that at all! If e.g. there is no ?
in the text, then split("? ")
will just return the text as a whole (wrapped in a list). You could just remove the conditions, or iterate a list of possible sentence-ending punctuation. However, note that capitalize
will not just upper-case the first letter, but also lower-case all the rest, e.g. names, acronyms, or words previously capitalized for a different type of punctuation. Instead, you could just upper
the first char and keep the rest.
text = "text with. multiple types? of sentences! more stuff."
for sep in (". ", "? ", "! "):
text = sep.join(s[0].upper() + s[1:] for s in text.split(sep))
print(text)
# Text with. Multiple types? Of sentences! More stuff.
You could also use a regular expression to split by all sentence separators at once. This way, you might even be ablt to use capitalize
, although it will still lower-case names and acronyms.
import re
>>> ''.join(s.capitalize() for s in re.split(r"([\?\!\.] )", text))
'Text with. Multiple types? Of sentences! More stuff.'
Or using re.sub
with a look-behind (note the first char is still lower-case):
>>> re.sub(r"(?<=[\?\!\.] ).", lambda m: m.group().upper(), text)
'text with. Multiple types? Of sentences! More stuff.'
However, unless you know what those are doing, I'd suggest going with the first loop-based version.
Upvotes: 2