lennyd
lennyd

Reputation: 33

Regex - repeating matches

Another regexp question

I have input text like this:

test start first end start second end start third end

and I need matches like this:

test first
test second 
test third

I've tried something like that:

start(.*?)end

but how to add "test"?

Thanks for any suggestion

Lennyd

(edited - there was mistake in input text)


There is no chance to use another programming language, it should be just regexp. I need this for parsing web page with (part) syntax like this:

Season 1
    Episode 1
    Episode 2
    Episode 3
Season 2
    Episode 1
    Episode 2
...etc

and with this regexp i need output like


<episodeslist>>
  <episode season="1" episode="1">
  <episode season="1" episode="2">
.. etc

.. deatiled - it is for xmbc.org media scraper

Upvotes: 0

Views: 875

Answers (2)

Luxvero
Luxvero

Reputation: 11

Am I the only one who didnt understand what lennyd wants in the first example?

Now for this one

input

Season 1
  Episode 1
  Episode 2
  Episode 3

output

<episodeslist>
  <episode season="1" episode="1">
  <episode season="1" episode="2">

assuming you're using a regex multiline tool

catch
/Season[^0-9]*([0-9]+)[^\n]*[\s]+Episode[^0-9]*([0-9]+)\n/gs
add as many [\s]+Episode[^0-9]*([0-9]+)\n as needed

return

<list>
<episode season=$1 episode=$2>
<episode season=$1 episode=$3>
<episode season=$1 episode=$4>
<episode season=$1 episode=$5>

just not sure about [^\n] , use [^E] if the input in really that clean

If the number of episodes varies between 24 o 26, just run 3 regex

If you want something more flexible, you'll need some powerfull app like GREP on linux or some clones with UI for other OS, that can do "regex inside regex"

If its some scripted language running regex functions, you could easily wrap the following in a loop, untill input no longer matches anything
{

1 - Match only `Season[^0-9]*([0-9]+)`, strip if off the input, store the season # in a variable,  
2 - Match a block of episodes `([\s]+Episode[^0-9]*[0-9]+\n)+`  
3 - Then inside that block match single lines `[\s]+Episode[^0-9]*[0-9]+`  
4 - Using the season variable, output the appropriate XML  

}

Upvotes: 1

dma_k
dma_k

Reputation: 10639

A very primitive regex will be:

echo "test start first end start second end test third end" |
     perl -ne 'print "$1 -> $2\n" while (/(\w+).*?(\w+) end/g);'
test -> first
start -> second
test -> third

but I agree with Alan Moore, that you sample output is a bit wired.

Upvotes: 0

Related Questions