Reputation: 17
I was trying to figure out how to split and save a text into different sentences in python based on various periods, like , . ? !
. But some text has decimal points and re.split
considers that as a period. I was wondering how I can get around that? any help would be appreciated!
Eg text:
A 0.75-in-diameter steel tension rod is 4.8 ft long and carries a load of 13.5 kip. Find the tensile stress, the total deformation, the unit strains, and the change in the rod diameter.
Upvotes: 1
Views: 45
Reputation: 7040
This will depend on your input, but if you can assume that ever period that you want to split at is followed by a space, then you can simply do:
>>> s = 'A 0.75-in-diameter steel tension rod is 4.8 ft long and carries a load of 13.5 kip. Find the tensile stress, the total deformation, the unit strains, and the change in the rod diameter.'
>>> s.split('. ')
['A 0.75-in-diameter steel tension rod is 4.8 ft long and carries a load of 13.5 kip', 'Find the tensile stress, the total deformation, the unit strains, and the change in the rod diameter.']
For anything more complicated than that, you'll probably want to use a regex like so:
import re
re.split(r'[\.!?]\s', s)
Upvotes: 4