Vova
Vova

Reputation: 599

Split text by sentences

I run into a problem to find a comfort method to split text by the list of predefined sentences. Sentences can include any special characters and whatever absolutely custom.

Example:

text = "My name. is A. His name is B. Her name is C. That's why..."
delims = ["My name. is", "His name is", "Her name is"]

I want something like:

def custom_sentence_split(text, delims):
     # stuff
     return result

custom_sentence_split(text, delims)
# ["My name. is", "  A. ", "His name is", "  B. ", "Her name is", " C. That's why..."]

UPD. Well there can be non-comfort solution like that, I'd prefer to getting more comfort one


def collect_output(text, finds):
    text_copy = text[:]
    retn = []
    for found in finds:
        part1, part2 = text_copy.split(found, 1)
        retn += [part1, found]
        text_copy = part2
    return retn
    

def custom_sentence_split(text, splitters):
    pattern = "("+"|".join(splitters)+"|)"
    finds = list(filter(bool, re.findall(pattern, text)))
    output = collect_output(text, finds)
    return output

UPD2: seems working solution is found.

pattern = "("+"|".join(map(re.escape, delims)) +")"; 
re.split(pattern, text)

Upvotes: 0

Views: 130

Answers (2)

tomdartmoor
tomdartmoor

Reputation: 262

You want to use the re.split method.

You will need a regex string like (My\sname\sis|His\sname\sis|Her\sname\sis)

You could construct your regex string like "("+"|".join(map(re.escape, delims))+")"

Edit: You could do something like this:

text = "My name is A. His name is B. Her name is C. That's why..."
delims = ["My name is", "His name is", "Her name is"]

import re

def custom_sentence_split(text,delims):
    pattern = "("+"|".join(map(re.escape, delims))+")"
    return re.split(pattern,text)

print(custom_sentence_split(text,delims))

Upvotes: 1

LetzerWille
LetzerWille

Reputation: 5668

text = "My name is A. His name is B. Her name is C. That's why..."

print([x.strip() for x in re.split(r'(.+?[A-Z]\.)', text) if x])

['My name is A.', 'His name is B.', 'Her name is C.', "That's why..."]

Upvotes: 1

Related Questions