Reputation:
Suppose such a text
s = '\n\nPART I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level\n\nPART II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles\n\nPART III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\n\nTO GET THE PEOPLE\n\nTO BUILD AND EVOLVE YOUR \nWork Principles: Putting It All Together\n\n'
Split it by delimiter PART
In [14]: parts = re.split(r'\n\nPART',s)
In [15]: parts
Out[15]:
['',
' I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level',
' II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles',
' III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\n\nTO GET THE PEOPLE\n\nTO BUILD AND EVOLVE YOUR \nWork Principles: Putting It All Together\n\n']
Add prefix Part
back to list
In [16]: ['PART '+ i for i in parts if i]
Out[16]:
['PART I, WHERE I’M COMING FROM\n\n1\xa0My Call to Adventure: 1949–1967\n2\xa0Crossing the Threshold: 1967–1979\n3\xa0My Abyss: 1979–1982\n4\xa0My Road of Trials: 1983–1994\n5\xa0The Ultimate Boon: 1995–2010\n6\xa0Returning the Boon: 2011–2015\n7\xa0My Last Year and My Greatest Challenge: 2016–2017\n8\xa0Looking Back from a Higher Level',
'PART II, LIFE PRINCIPLES\n\n1\xa0Embrace Reality and Deal with It\n2\xa0Use the 5-Step Process to Get What You Want Out of Life\n3\xa0Be Radically Open-Minded\n4\xa0Understand That People Are Wired Very Differently\n5\xa0Learn How to Make Decisions Effectively\nLife Principles: Putting It All Together\nSummary and Table of Life Principles',
'PART III, WORK PRINCIPLES\n\nSummary and Table of Work Principles\nTO GET THE CULTURE RIGHT\n\nTO GET THE PEOPLE\n\nTO BUILD AND EVOLVE YOUR \nWork Principles: Putting It All Together\n\n']
I would like to finish it in one step,
In [17]: parts = re.findall(r'\n\nPART.+', s)
In [18]: parts
Out[18]:
['\n\nPART I, WHERE I’M COMING FROM',
'\n\nPART II, LIFE PRINCIPLES',
'\n\nPART III, WORK PRINCIPLES
#dot stops at \n, I desire to solve the problem with quantifier(multipy many stops)
In [20]: parts = re.findall(r'\n\n(?:PART.+)+', s)
In [21]: parts
Out[21]:
['\n\nPART I, WHERE I’M COMING FROM',
'\n\nPART II, LIFE PRINCIPLES',
'\n\nPART III, WORK PRINCIPLES']
#Unfortunately, it prints the same output
How to accomplish such a task?
Upvotes: 2
Views: 383
Reputation: 521639
Try splitting on a positive lookahead to retain the delimiter, using the regex
module:
import regex
print regex.split(r"(?=\n\nPART)", s, flags=regex.VERSION1)
Upvotes: 2