Reputation: 23
I am learning regular expressions with Python and I want to prepare an RE to match and collect sentence(s) from below input:
- Food : Cake : Baked sweet food made from flour, sugar and other ingredients.
- Electronics : Computer : A machine to carry out a computer programming operation.
Computers mainly consists of a CPU, monitor, keyboard and a mouse.- Automobile : Car : Car is a four wheeled motor vehicle used for transportation.
My expected output should give me category, item and the description of that item. So for 1st item, Cake, the RE should group "Food", "Cake", "Baked sweet food made from flour, sugar and other ingredients.".
My current RE looks like this:
[0-9]+\s*.\s*(\w*)\s*:\s*(\w*)\s*:\s*(.*)
This seems to be working for items which has description with no line-breaks. If it has a line-break, i.e., Computer in the example, the RE only matches its description up to the line-break. The RE discards the second sentence in that description.
Please help me understand what I am missing out here.
Upvotes: 2
Views: 59
Reputation: 3553
This may be a rudimentary approach, but it works on the sample input you've provided:
[0-9]+\s*.\s*(\w*)\s*:\s*(\w*)\s*:\s*((?:.*[\n\r]?)+?)(?=$|\d\s*\.)
Basically, we take as much text (including newlines) as possible in a description until we reach the end of the file, or another numerical index.
You can see the implementation here
Upvotes: 1
Reputation: 195478
If the category, item and description is separated by double newline, you can use this example to parse it (regex101):
import re
txt = '''1. Food : Cake : Baked sweet food made from flour, sugar and other ingredients.
2. Electronics : Computer : A machine to carry out a computer programming operation.
Computers mainly consists of a CPU, monitor, keyboard and a mouse.
3. Automobile : Car : Car is a four wheeled motor vehicle used for transportation.'''
for cat, item, desc in re.findall(r'^(?:\d+)\.([^:]+):([^:]+):(.*?)(?:\n\n|\Z)', txt, flags=re.M|re.S):
print(cat)
print(item)
print(desc)
print('-' * 80)
Prints:
Food
Cake
Baked sweet food made from flour, sugar and other ingredients.
--------------------------------------------------------------------------------
Electronics
Computer
A machine to carry out a computer programming operation.
Computers mainly consists of a CPU, monitor, keyboard and a mouse.
--------------------------------------------------------------------------------
Automobile
Car
Car is a four wheeled motor vehicle used for transportation.
--------------------------------------------------------------------------------
Upvotes: 2