spaCy fails to properly parse medical text

Question

Recently I have been experiencing some issues while splitting some medical text into sentences with spaCy. Maybe you can explain, why these issues arise?

If the word has a length of 1 and the sentence ends with a dot, the end of the sentence won't be recognized. For example:

There was no between-treatment difference in preoperative or postoperative hemodynamics or in release of troponin I. (NO SPLIT HERE) Preoperative oral coenzyme Q(10) therapy in patients undergoing cardiac surgery increases myocardial and cardiac mitochondrial coenzyme Q(10) levels, improves mitochondrial efficiency, and increases myocardial tolerance to in vitro hypoxia-reoxygenation stress.

Another issue is with the characters +/-, which is treated as the end of a sentence. For instance one whole sentence is split into several sentences like below:

VO(2max) decreased significantly by 3.6 +/-
2.1, 14 +/-
2.5, and 27.4 +/-
3.6% in TW, and by 5 +/-
4, 9.4 +/-
6.4, and 18.7 +/-
7% in SW at 1000, 2500, and 4500 m, respectively.

All of the above should be one single sentence!

Sometimes the sentence is interrupted between a word and a special character (special and special character, number and a word with a length less than 3).

The survival rates for patients receiving left ventricular assist devices (n = 68) versus patients receiving optimal medical management (n = 61) were 52% versus 28% at 1 year and 29% versus 13% at 2 years SPLITS HERE ( P = .008, log-rank test).

Thank you very much!

spaCy fails to properly parse medical text

Answers (1)

Related Questions