Sebastian
Sebastian

Reputation: 967

Python: Find and remove a string starting and ending with a specific substring in python

I have a string and it has a number of substrings that I'd like to delete.

Each of the substrings start with ApPle and end with THE BEST PIE — STRAWBERRY.

I tried the suggestions on this post, but they didn't work.

Input

Cannoli (Italian pronunciation: [kanˈnɔːli]; Sicilian: cannula) are Italian ApPle Sep 12 THE BEST PIE —
STRAWBERRY pastries that originated on the island of Sicily and are today a staple of Sicilian cuisine1[2] as well as Italian-American cuisine. Cannoli consist of tube-shaped shells of fried pastry dough, filled with a sweet, creamy filling usually ApPle Aug 4 THE BEST PIE — STRAWBERRY containing ricotta. They range in size from "cannulicchi", no bigger than a finger, to the fist-sized proportions typically found south of Palermo, Sicily, in Piana degli Albanesi.[2]

import re
array = []

#open the file and delete new lines
with open('canoli.txt', 'r') as myfile:
    file = myfile.readlines()
    array = [s.rstrip('\n') for s in file]
    text = ' '.join(array)

attempt1 = re.sub(r'/ApPle+THE.BEST.PIE.-.STRAWBERRY/','',text)
attempt2 = re.sub(r'/ApPle:.*?:THE.BEST.PIE.-.STRAWBERRY/','',text)
print(attempt1)
print(attempt2)

Desired Output

Cannoli (Italian pronunciation: [kanˈnɔːli]; Sicilian: cannula) are Italian pastries that originated on the island of Sicily and are today a staple of Sicilian cuisine1[2] as well as Italian-American cuisine. Cannoli consist of tube-shaped shells of fried pastry dough, filled with a sweet, creamy filling usually containing ricotta. They range in size from "cannulicchi", no bigger than a finger, to the fist-sized proportions typically found south of Palermo, Sicily, in Piana degli Albanesi.[2]

Upvotes: 0

Views: 95

Answers (1)

Keatinge
Keatinge

Reputation: 4341

I think your regex should be: ApPle.*?THE\sBEST\sPIE\s—\sSTRAWBERRY

and you need to add the regex option DOTALL to handle newlines properly, try this:

re.sub(r'ApPle.*?THE\sBEST\sPIE\s—\sSTRAWBERRY','',text, flags=re.DOTALL)

Upvotes: 1

Related Questions