Reputation: 967
I have a string and it has a number of substrings that I'd like to delete.
Each of the substrings start with ApPle
and end with THE BEST PIE — STRAWBERRY
.
I tried the suggestions on this post, but they didn't work.
Input
Cannoli (Italian pronunciation: [kanˈnɔːli]; Sicilian: cannula) are Italian ApPle Sep 12 THE BEST PIE —
STRAWBERRY pastries that originated on the island of Sicily and are today a staple of Sicilian cuisine1[2] as well as Italian-American cuisine. Cannoli consist of tube-shaped shells of fried pastry dough, filled with a sweet, creamy filling usually ApPle Aug 4 THE BEST PIE — STRAWBERRY containing ricotta. They range in size from "cannulicchi", no bigger than a finger, to the fist-sized proportions typically found south of Palermo, Sicily, in Piana degli Albanesi.[2]
import re
array = []
#open the file and delete new lines
with open('canoli.txt', 'r') as myfile:
file = myfile.readlines()
array = [s.rstrip('\n') for s in file]
text = ' '.join(array)
attempt1 = re.sub(r'/ApPle+THE.BEST.PIE.-.STRAWBERRY/','',text)
attempt2 = re.sub(r'/ApPle:.*?:THE.BEST.PIE.-.STRAWBERRY/','',text)
print(attempt1)
print(attempt2)
Desired Output
Cannoli (Italian pronunciation: [kanˈnɔːli]; Sicilian: cannula) are Italian pastries that originated on the island of Sicily and are today a staple of Sicilian cuisine1[2] as well as Italian-American cuisine. Cannoli consist of tube-shaped shells of fried pastry dough, filled with a sweet, creamy filling usually containing ricotta. They range in size from "cannulicchi", no bigger than a finger, to the fist-sized proportions typically found south of Palermo, Sicily, in Piana degli Albanesi.[2]
Upvotes: 0
Views: 95
Reputation: 4341
I think your regex should be: ApPle.*?THE\sBEST\sPIE\s—\sSTRAWBERRY
and you need to add the regex option DOTALL to handle newlines properly, try this:
re.sub(r'ApPle.*?THE\sBEST\sPIE\s—\sSTRAWBERRY','',text, flags=re.DOTALL)
Upvotes: 1