Reputation: 404
I have text where some sentences start with lowercase. i need to find them and replace with correct sentence case.some punctuations are incorrect. i.e. sentence starting after full stop without space.
i.e.
.this sentence
and this.also this. and this.This one is not.
replace with ->
.This sentence
And this.Also this. And this.This one is not.
sublime text 3 solution, regex , or python nltk solution is suitable.
i tried this solution. but it is slow and does not find sentences without space after full stop.
import nltk.data
from nltk.tokenize import sent_tokenize
text = """kjdshkjhf. this sentence
and this.also this. and this. This one is not."""
aa=sent_tokenize(text)
for a in aa:
if (a[0].islower()):
print a
print "****"
Upvotes: 2
Views: 293
Reputation: 37775
You can use this pattern
^([^a-zA-Z]*)([a-z])
and use $1\U$2
as substitution
Update:- If you want to capture first lowercase after each .
( period ) you can use this
^([^a-zA-Z]*)([a-z])|(\.\s*)([a-z])
Upvotes: 1