Mr Mystery Guest
Mr Mystery Guest

Reputation: 1474

Regex to splitstring on date and keep it

I have a string that I want to split on the date:

28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato

which should end up as

 28/11/2016 Mushroom
 05/12/2016 Carrot
 12/12/2016 Broccoli
 19/12/2016 Potato

Obviously the date changes which makes it difficult. I've worked out the regex but I can't figure out how to keep the delimiter (the date) as well.

import re

s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"

replaced = re.sub(r"\d{2}\/\d{2}\/\d{4}\s*", ",", s) # looses data
print replaced

g = re.match(r"(\d{2}\/\d{2}\/\d{4}\s*)(.*)", s)

if g:
  # replaced = s.replace(group(0), "\n" + g.group(0)) # fails
  # print replaced 

Upvotes: 2

Views: 5046

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626738

You may use a splitting approach if there is always whitespace between the dates:

\s+(?=\d+/\d+/\d+\s)

See the regex demo

Details:

  • \s+ - match 1+ whitespaces
  • (?=\d+/\d+/\d+\s) - that are followed with 1+ digits, and / + one or more digits twice (the date-like pattern), and then a whitespace

See a Python demo below:

import re
rx = r"\s+(?=\d+/\d+/\d+\s)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.split(rx, s)
print(results)

Alternatively, a more complex regex can be used to actually match those dates:

\b\d+/\d+/\d+.*?(?=\s*\b\d+/\d+/\d+|$)

See the regex demo and a Python demo:

import re
rx = r"\b\d+/\d+/\d+.*?(?=\b\d+/\d+/\d+|$)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.findall(rx, s)
print(results)

Here,

  • \b\d+/\d+/\d+ - matches a word boundary and a date-like pattern
  • .*? - any 0+ chars, as few as possible up to the first location that is followed with...
  • (?=\s*\b\d+/\d+/\d+|$) - 0+ whitespaces and a date-like pattern OR the end of string ($).

Upvotes: 1

Related Questions