Delavine
Delavine

Reputation: 35

How do I extract sentences in regex that are separated by a word followed by a semicolon?

I'm trying to write some regex pattern to take out the location, features, and payment accepted sections from a block of text. I'm making a website that shows food options and in the API, they have a description section that includes more than enough information which is why I want to extract specific text.

I looked into positive and negative lookahead in regex but I still wasn't able to solve my problem. I'm able to select everything up to the second section but only if in this case I was selecting the location. If i was selecting the features, I would be also selecting the previous section, location. See the below text as an example.

Here's the text from which I want to extract from:

Location: Village 1 \r\n\r\nFeatures:  A multitude of offerings, including entrees, hot meals, wood-fired pizza, salad bar, grill items, made-to-order deli sandwiches & wraps, convenience items and much more\r\n\r\nPayment accepted: cash, Watcard  \r\n\r\nThis is a great place to meet your friends! The aroma of fresh baked breads and pastries from our in-house UW Bakery will surely make you take a deep breath. Mudie’s offers a large selection of vegetarian foods, grab n’ go items, salad bar, grill items, made-to-order deli sandwiches and pitas, full breakfast, and convenience foods. A hot entrée item and side dishes are available every lunch and dinner hour.\r\n\r\nMeal hours for Mom's Counter*:\r\n\r\nBreakfast: 7:30 - 11:00 am\r\n\r\nLunch:11:30 am - 2:00 pm\r\n\r\nDinner: 4:30 - 8:00 pm \r\n\r\n*please note, these hours are subject to change without notice "

I wrote this so far:

  /.+?(?=Payment accepted)/

which selects everything up to Payment Accepted section. I also wrote

/(Location|Features|Payment accepted):\s{1,4}?[A-Z]+\s?\d?/

where it selects the section of my three desired places. I'm not able to connect the two or to come up with anything that is able to select what I need without including another section. Any help would be appreciated.

So in the above case, my extracted parts would be:

Location: Village 1
Features:  A multitude of offerings, including entrees, hot meals, wood-fired pizza, salad bar, grill items, made-to-order deli sandwiches & wraps, convenience items and much more
Payment accepted: cash, Watcard

Upvotes: 1

Views: 46

Answers (2)

Sam Chen
Sam Chen

Reputation: 157

If I'm understanding this correctly and you're sure that the sections come in the same repeating order, then you can just put those regex back to back for each section.

Is something like this what you were looking for?

Location:\s?([\w\d ]+)\s{1,5}Features:\s+(.+)\s{1,5}Payment accepted:\s?(.+)

Upvotes: 0

Pushpesh Kumar Rajwanshi
Pushpesh Kumar Rajwanshi

Reputation: 18357

You can use this regex for extracting those three sections of text,

/Location:\s*([^\v]*)\s*Features:\s*([^\v]*)Payment accepted:(.*?)(?=\r\n)/

Here is the JS code for same.

    var myString = "Location: Village 1 \r\n\r\nFeatures:  A multitude of offerings, including entrees, hot meals, wood-fired pizza, salad bar, grill items, made-to-order deli sandwiches & wraps, convenience items and much more\r\n\r\nPayment accepted: cash, Watcard  \r\n\r\nThis is a great place to meet your friends! The aroma of fresh baked breads and pastries from our in-house UW Bakery will surely make you take a deep breath. Mudie’s offers a large selection of vegetarian foods, grab n’ go items, salad bar, grill items, made-to-order deli sandwiches and pitas, full breakfast, and convenience foods. A hot entrée item and side dishes are available every lunch and dinner hour.\r\n\r\nMeal hours for Mom's Counter*:\r\n\r\nBreakfast: 7:30 - 11:00 am\r\n\r\nLunch:11:30 am - 2:00 pm\r\n\r\nDinner: 4:30 - 8:00 pm \r\n\r\n*please note, these hours are subject to change without notice "; // I want "abc"

    var arr = /Location:\s*([^\v]*)\s*Features:\s*([^\v]*)Payment accepted:([^\r\n]*)/.exec(myString);

    console.log("Location --> "+arr[1]);
    console.log("Features --> "+arr[2]);
    console.log("Payment accepted --> "+arr[3]);

Upvotes: 2

Related Questions