Omar Taher
Omar Taher

Reputation: 61

Mix regex matched groups

If I have an input such as SuMoTu 11:00AM - 1:00PM, is it possible to generate Su 11:00AM - 1:00PM, Mo 11:00AM - 1:00PM, Tu 11:00AM - 1:00PM using regex only without the use of loops ?

I want to mix 11:00AM - 1:00PM with all the associated days SuMoTu.

Of course the days won't be always 3 days. It will range from 1 day to 5 days. Also, the days will be represented with 2 characters only. For the time, it will be always one time range.

Upvotes: 0

Views: 60

Answers (2)

vsemozhebuty
vsemozhebuty

Reputation: 13822

You can try to use lookahead with capturing and non-capturing groups. Here is an example in JavaScript:

const re = /([A-Z][a-z])(?=(?:[A-Z][a-z])+ (\d\d?:\d\d(?:AM|PM) - \d\d?:\d\d(?:AM|PM)))/g;
const replacement = '$1 $2, ';

console.log('Su 11:00AM - 1:00PM'.replace(re, replacement));
console.log('SuMo 11:00AM - 1:00PM'.replace(re, replacement));
console.log('SuMoTu 11:00AM - 1:00PM'.replace(re, replacement));
console.log('SuMoTuWe 11:00AM - 1:00PM'.replace(re, replacement));
console.log('SuMoTuWeTh 11:00AM - 1:00PM'.replace(re, replacement));

Upvotes: 1

James K. Lowden
James K. Lowden

Reputation: 7837

without the use of loops

I think the answer is No, for theoretical reasons. Regular expressions decide what to do as they proceed along the input string. What you want to do is match part of the string (Su), print it, then skip ahead to the space, grab the rest of the line, print it, and then backtrack to the last next 2-letter day, and repeat. There's no regular expression for that. There might be some kind of extended regular expression for it, but it would still be a loop.

However, you can get away with a very small loop:

$ echo SuMoTu 11:00AM - 1:00PM | 
  awk '{ time = $2 " - " $4; 
         while(/^[SMTWTFS]/) { 
             day = substr($1, 1, 2); 
             $1 = substr($1, 3, length($1) - 2); 
             print day, time 
         } 
       }'
Su 11:00AM - 1:00PM
Mo 11:00AM - 1:00PM
Tu 11:00AM - 1:00PM

Explanation:

awk breaks the input into strings delimited by whitespace, and numbers them $1, $2, etc. As long as the beginning of the input string is a capital letter representing a day, remove the first two letters, and print them, along with the time components (captured in $2 and $4).

This loop is "small" in the sense that each input line is read once and scanned N times, where N is the number of days in the leading string.

I doubt you'll find anything faster for this problem than awk without using a compiled language. Unless you're processing millions of lines in a very time-constrained situation, you won't need to, either. My little machine processed 1 million lines in 3 seconds.

Upvotes: 1

Related Questions