Hung Dinh Nguyen
Hung Dinh Nguyen

Reputation: 53

How to write regular expression to use re.split in python

I have a string like this:

----------

FT Weekend

----------

Why do we run marathons?
Are marathons and cycling races about more than exercise? What does the 
literature of endurance tell us about our thirst for self-imposed hardship? 

I want to delete the part from ---------- to the next ---------- included.

I have been using re.sub:

pattern =r"-+\n.+\n-+"
re.sub(pattern, '', thestring)

Upvotes: 2

Views: 215

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626927

The problem with your regex (-+\n.+\n-+) is that . matches any character but a newline, and that it is too greedy (.+), and can span across multiple ------- entities.

You can use the following regex:

pattern = r"(?s)-+\n.+?\n-+"

The (?s) singleline option makes . match any character including newline. The .+? pattern will match 1 or more characters but as few as possible to match up to the next ----.

See IDEONE demo

For a more profound cleanup, I'd recommend:

pattern = r"(?s)\s*-+\n.+?\n-+\s*"

See another demo

Upvotes: 0

vks
vks

Reputation: 67968

pattern =r"-+\n.+?\n-+"
re.sub(pattern, '', thestring,flags=re.DOTALL)

Just use DOTALL flag.The problem with your regex was that by default . does not match \n.So you need to explicitly add a flag DOTALL making it match \n.

See demo.

https://regex101.com/r/hR7tH4/24

or

pattern =r"-+\n[\s\S]+?\n-+"
re.sub(pattern, '', thestring)

if you dont want to add a flag

Upvotes: 4

Kasravnd
Kasravnd

Reputation: 107297

Your regex doesn't match the expected part because .+ doesn't capture new line character. you can use re.DOTALL flag to forced . to match newlines or re.S.but instead of that You can use a negated character class :

>>> print re.sub(r"-+[^-]+-+", '', s)
''

Why do we run marathons?
Are marathons and cycling races about more than exercise? What does the 
literature of endurance tell us about our thirst for self-imposed hardship? 
>>> 

Or more precise you can do:

>>> print re.sub(r"-+[^-]+-+[^\w]+", '', s)
'Why do we run marathons?
Are marathons and cycling races about more than exercise? What does the 
literature of endurance tell us about our thirst for self-imposed hardship? 
>>> 

Upvotes: 2

Related Questions