visylvius
visylvius

Reputation: 65

Regex to grab all text before and after match, and stop before second keyword is found

I'd like to create a regex that would be able to grab everything up to and after DESCRIPTION, until the next TITLE: is found.

  const data = `TITLE: Hitchhiker's Guide to the Galaxy
  AUTHOR: Douglas Adams
  DESCRIPTION: Seconds before the Earth is demolished to make way for a galactic freeway,
  Arthur Dent is plucked off the planet by his friend Ford Prefect, a researcher for the 
  revised edition of The Hitchhiker's Guide to the Galaxy who, for the last fifteen 
  years, has been posing as an out-of-work actor. 
  TITLE: Dune
  AUTHOR: Frank Herbert
  DESCRIPTION: The troubles begin when stewardship of Arrakis is transferred by the
  Emperor from the Harkonnen Noble House to House Atreides. The Harkonnens don't want to
  give up their privilege, though, and through sabotage and treachery they cast young 
  Duke Paul Atreides out into the planet's harsh environment to die. There he falls in 
  with the Fremen, a tribe of desert dwellers who become the basis of the army with which
  he will reclaim what's rightfully his. Paul Atreides, though, is far more than just a 
  usurped duke. He might be the end product of a very long-term genetic experiment 
  designed to breed a super human; he might be a messiah. His struggle is at the center
  of a nexus of powerful people and events, and the repercussions will be felt throughout 
  the Imperium.
  TITLE: A Song Of Ice And Fire Series
  AUTHOR: George R.R. Martin
  DESCRIPTION: As the Seven Kingdoms face a generation-long winter, the noble Stark 
  family confronts the poisonous plots of the rival Lannisters, the emergence of the 
  White Walkers, the arrival of barbarian hordes, and other threats.`

My desired output would be

[
  "TITLE: Hitchhiker's Guide to the Galaxy
  AUTHOR: Douglas Adams
  DESCRIPTION: Seconds before the Earth is demolished to make way for a galactic freeway,
  Arthur Dent is plucked off the planet by his friend Ford Prefect, a researcher for the 
  revised edition of The Hitchhiker's Guide to the Galaxy who, for the last fifteen 
  years, has been posing as an out-of-work actor.",

  "TITLE: Dune
  AUTHOR: Frank Herbert
  DESCRIPTION: The troubles begin when stewardship of Arrakis is transferred by the
  Emperor from the Harkonnen Noble House to House Atreides. The Harkonnens don't want to
  give up their privilege, though, and through sabotage and treachery they cast young 
  Duke Paul Atreides out into the planet's harsh environment to die. There he falls in 
  with the Fremen, a tribe of desert dwellers who become the basis of the army with which
  he will reclaim what's rightfully his. Paul Atreides, though, is far more than just a 
  usurped duke. He might be the end product of a very long-term genetic experiment 
  designed to breed a super human; he might be a messiah. His struggle is at the center
  of a nexus of powerful people and events, and the repercussions will be felt throughout 
  the Imperium.",

  "TITLE: A Song Of Ice And Fire Series
  AUTHOR: George R.R. Martin
  DESCRIPTION: As the Seven Kingdoms face a generation-long winter, the noble Stark 
  family confronts the poisonous plots of the rival Lannisters, the emergence of the 
  White Walkers, the arrival of barbarian hordes, and other threats."
]

I've tried parsing the string using split, but unfortunately there is no easy character to split the string on, as periods, commas, and newlines all exist. I think a regex would be the way to go, but I am open to suggestions, please help?

Upvotes: 0

Views: 61

Answers (3)

mfulton26
mfulton26

Reputation: 31234

You can use a positive lookahead:

data.split(/(?=TITLE:)/g);

If you don't want the trailing whitespace then you can remove it in the split pattern:

data.split(/\s*(?=TITLE:)/g);

Upvotes: 1

ggorlen
ggorlen

Reputation: 56975

/(?=TITLE: )/g seems like a reasonable start. I'm not sure if the gutter of 2 characters whitespace is in your original text or not, but adding ^ or ^ to the front of the lookahead is nice to better avoid false-positives, i.e. /(?=^TITLE: )/mg, /(?=^ TITLE: )/mg or /(?=^ *TITLE: )/mg.

const data = `TITLE: Hitchhiker's Guide to the Galaxy
  AUTHOR: Douglas Adams
  DESCRIPTION: Seconds before the Earth is demolished to make way for a galactic freeway,
  Arthur Dent is plucked off the planet by his friend Ford Prefect, a researcher for the 
  revised edition of The Hitchhiker's Guide to the Galaxy who, for the last fifteen 
  years, has been posing as an out-of-work actor. 
  TITLE: Dune
  AUTHOR: Frank Herbert
  DESCRIPTION: The troubles begin when stewardship of Arrakis is transferred by the
  Emperor from the Harkonnen Noble House to House Atreides. The Harkonnens don't want to
  give up their privilege, though, and through sabotage and treachery they cast young 
  Duke Paul Atreides out into the planet's harsh environment to die. There he falls in 
  with the Fremen, a tribe of desert dwellers who become the basis of the army with which
  he will reclaim what's rightfully his. Paul Atreides, though, is far more than just a 
  usurped duke. He might be the end product of a very long-term genetic experiment 
  designed to breed a super human; he might be a messiah. His struggle is at the center
  of a nexus of powerful people and events, and the repercussions will be felt throughout 
  the Imperium.
  TITLE: A Song Of Ice And Fire Series
  AUTHOR: George R.R. Martin
  DESCRIPTION: As the Seven Kingdoms face a generation-long winter, the noble Stark 
  family confronts the poisonous plots of the rival Lannisters, the emergence of the 
  White Walkers, the arrival of barbarian hordes, and other threats.`;
  
console.log(data.split(/(?=TITLE: )/g));

Upvotes: 3

Kinglish
Kinglish

Reputation: 23654

Backup plan... a map() that does the trick as well.

const ndata = data.split("TITLE:").map(block => "TITLE:" + block).slice(1);

const data = `TITLE: Hitchhiker's Guide to the Galaxy
  AUTHOR: Douglas Adams
  DESCRIPTION: Seconds before the Earth is demolished to make way for a galactic freeway,
  Arthur Dent is plucked off the planet by his friend Ford Prefect, a researcher for the 
  revised edition of The Hitchhiker's Guide to the Galaxy who, for the last fifteen 
  years, has been posing as an out-of-work actor. 
  TITLE: Dune
  AUTHOR: Frank Herbert
  DESCRIPTION: The troubles begin when stewardship of Arrakis is transferred by the
  Emperor from the Harkonnen Noble House to House Atreides. The Harkonnens don't want to
  give up their privilege, though, and through sabotage and treachery they cast young 
  Duke Paul Atreides out into the planet's harsh environment to die. There he falls in 
  with the Fremen, a tribe of desert dwellers who become the basis of the army with which
  he will reclaim what's rightfully his. Paul Atreides, though, is far more than just a 
  usurped duke. He might be the end product of a very long-term genetic experiment 
  designed to breed a super human; he might be a messiah. His struggle is at the center
  of a nexus of powerful people and events, and the repercussions will be felt throughout 
  the Imperium.
  TITLE: A Song Of Ice And Fire Series
  AUTHOR: George R.R. Martin
  DESCRIPTION: As the Seven Kingdoms face a generation-long winter, the noble Stark 
  family confronts the poisonous plots of the rival Lannisters, the emergence of the 
  White Walkers, the arrival of barbarian hordes, and other threats.`

const ndata = data.split("TITLE:").map(block => "TITLE:" + block).slice(1);
console.log(ndata)

Upvotes: 2

Related Questions