user16452533
user16452533

Reputation:

Regex to extract each alphanumeric pattern

I have a string like following

19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit

How can I write a regex that would give me these two separate strings

19990101 - John DoeLorem ipsum dolor sit amet

19990102 - Elton Johnconsectetur adipiscing elit

The regex I wrote works up to this

/\d+ -/gm

Image

But I don't know how can I include the alphabets there as well

Image2

Upvotes: 2

Views: 122

Answers (2)

Peter Seliger
Peter Seliger

Reputation: 13432

For the OP's use case a regex based split like with ... str.split(/(?<=\w)\s+(?=\d)/) ... already should do it.

The regex uses lookarounds, here trying to match any whitespace (sequence)/\s+ which is both led/(?<= ... ) by a word/\w and is followed/(?= ... ) by a digit/\d character.

console.log(
  '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit 19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit'
    .split(/(?<=\w)\s+(?=\d)/)
);

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You can use

const text = '19990101 - John DoeLorem ipsum dolor sit amet 19990102 - Elton Johnconsectetur adipiscing elit';
console.log(text.match(/\d+\s+-[A-Za-z0-9\s]*[A-Za-z]/g))
console.log(text.split(/(?!^)\s+(?=\d+\s+-)/))

The text.match(/\d+\s+-[A-Za-z0-9\s]*[A-Za-z]/g) approach is extracting the alphanumeric/whitespace chars after \d+\s+- pattern. Details:

  • \d+ - one or more digits
  • \s+ - one or more whitespaces
  • - - a hyphen
  • [A-Za-z0-9\s]* - zero or more alphanumeric or whitespace chars
  • [A-Za-z] - a letter

The text.split(/(?!^)\s+(?=\d+\s+-)/) splitting approach breaks the string with one or more whitespaces before one or more digits + one or more whitespaces + -:

  • (?!^) - not at the start of string
  • \s+ - one or more whitespaces
  • (?=\d+\s+-) - a positive lookahead that matches a location that is immediately followed with one or more digits + one or more whitespaces + -.

Upvotes: 0

Related Questions