Reputation: 135
I have a string that I want to split into a list of certain types. For example, I want to split Starter Main Course Dessert
to [Starter, Main Course, Dessert]
I cannot use split() because it will split up the Main Course
type. How can I do the splitting? Is regex needed?
Upvotes: 1
Views: 126
Reputation: 54223
If you have a list of acceptable words, you could use a regex union :
import re
acceptable_words = ['Starter', 'Main Course', 'Dessert', 'Coffee', 'Aperitif']
pattern = re.compile("("+"|".join(acceptable_words)+")", re.IGNORECASE)
# "(Starter|Main Course|Dessert|Coffee|Aperitif)"
menu = "Starter Main Course NotInTheList dessert"
print pattern.findall(menu)
# ['Starter', 'Main Course', 'dessert']
If you just want to specify which special substrings should be matched, you could use :
acceptable_words = ['Main Course', '\w+']
Upvotes: 3
Reputation: 2229
I think it's more practical to specify 'special' two-words tokens only.
special_words = ['Main Course', 'Something Special']
sentence = 'Starter Main Course Dessert Something Special Date'
words = sentence.split(' ')
for i in range(len(words) - 1):
try:
idx = special_words.index(str(words[i]) + ' ' + words[i+1])
words[i] = special_words[idx]
words[i+1] = None
except ValueError:
pass
words = list(filter(lambda x: x is not None, words))
print(words)
Upvotes: 0