alryosha
alryosha

Reputation: 743

Python regular expression split by multiple delimiters

Given the sentence "I want to eat fish and I want to buy a car. Therefore, I have to make money."

I want to split the sentene by

['I want to eat fish', 'I want to buy a car", Therefore, 'I have to make money']

I am trying to split the sentence

re.split('.|and', sentence)

However, it splits the sentence by '.', 'a', 'n', and 'd'.

How can I split the sentence by '.' and 'and'?

Upvotes: 1

Views: 123

Answers (2)

blhsing
blhsing

Reputation: 106455

In addition to escaping the dot (.), which matches any non-newline character in regex, you should also match any leading or trailing spaces in order for the delimiter of the split to consume undesired leading and trailing spaces from the results. Use a positive lookahead pattern to assert a following non-whitespace character in the end to avoid splitting by the trailing dot:

re.split('\s*(?:\.|and)\s*(?=\S)', sentence)

This returns:

['I want to eat fish', 'I want to buy a car', 'Therefore, I have to make money.']

Demo: https://replit.com/@blhsing/LimitedVastCookies

Upvotes: 2

Dan Nagle
Dan Nagle

Reputation: 5425

You need to escape the . in the regex.

import re

s = "I want to eat fish and I want to buy a car. Therefore, I have to make money."

re.split('\.|and', s)

Result:

['I want to eat fish ',
 ' I want to buy a car',
 ' Therefore, I have to make money',
 '']

Upvotes: 1

Related Questions