Reputation: 31963
This is the source text I want to parse:
1 From fairest creatures we desire increase, That thereby beauty's rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou contracted to thine own bright eyes, Feed'st thy light's flame with self-substantial fuel, Making a famine where abundance lies, Thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the gaudy spring, Within thine own bud buriest thy content, And tender churl mak'st waste in niggarding: Pity the world, or else this glutton be, To eat the world's due, by the grave and thee. 2 When forty winters shall besiege thy brow, And dig deep trenches in thy beauty's field, Thy youth's proud livery so gazed on now, Will be a tattered weed of small worth held: Then being asked, where all thy beauty lies, Where all the treasure of thy lusty days; To say within thine own deep sunken eyes, Were an all-eating shame, and thriftless praise. How much more praise deserved thy beauty's use, If thou couldst answer 'This fair child of mine Shall sum my count, and make my old excuse' Proving his beauty by succession thine. This were to be new made when thou art old, And see thy blood warm when thou feel'st it cold. 3 Look in thy glass and tell the face thou viewest, Now is the time that face should form another, Whose fresh repair if now thou not renewest, Thou dost beguile the world, unbless some mother. For where is she so fair whose uneared womb Disdains the tillage of thy husbandry? Or who is he so fond will be the tomb, Of his self-love to stop posterity? Thou art thy mother's glass and she in thee Calls back the lovely April of her prime, So thou through windows of thine age shalt see, Despite of wrinkles this thy golden time. But if thou live remembered not to be, Die single and thine image dies with thee.
I want to parse it into chunks like this:
The first chunk should be:
From fairest creatures we desire increase, That thereby beauty's rose might never die, But as the riper should by time decease, His tender heir might bear his memory: But thou contracted to thine own bright eyes, Feed'st thy light's flame with self-substantial fuel, Making a famine where abundance lies, Thy self thy foe, to thy sweet self too cruel: Thou that art now the world's fresh ornament, And only herald to the gaudy spring, Within thine own bud buriest thy content, And tender churl mak'st waste in niggarding: Pity the world, or else this glutton be, To eat the world's due, by the grave and thee.
The second:
When forty winters shall besiege thy brow, And dig deep trenches in thy beauty's field, Thy youth's proud livery so gazed on now, Will be a tattered weed of small worth held:
Then being asked, where all thy beauty lies, Where all the treasure of thy lusty days; To say within thine own deep sunken eyes, Were an all-eating shame, and thriftless praise.
The third:
How much more praise deserved thy beauty's use, If thou couldst answer 'This fair child of mine Shall sum my count, and make my old excuse' Proving his beauty by succession thine. This were to be new made when thou art old, And see thy blood warm when thou feel'st it cold.
…and so on. Every time a sentence ends with .
I want that part to be a new chunk.
How can I parse like this? I want some guidelines for a clear and efficient way to do this. I do no want to go character by character and do some checking…
Thanks
Upvotes: 1
Views: 503
Reputation: 33918
You could probably split it with something like:
re.split(r"(?:^|(?:[^\S\n]*\n){2}(?m)^)[ \t]+\d+[ \t]+[\r\n]+", text)
Upvotes: 1
Reputation: 2050
If you don't want to check character by character, and this is EXACTLY the source you have, you can check line by line, and search for empty ones.
Depending on implementation, I am not sure it would be much more efficient thus. Possibly the opposite.
Upvotes: 1