Reputation: 1474
I have a string that I want to split on the date:
28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato
which should end up as
28/11/2016 Mushroom
05/12/2016 Carrot
12/12/2016 Broccoli
19/12/2016 Potato
Obviously the date changes which makes it difficult. I've worked out the regex but I can't figure out how to keep the delimiter (the date) as well.
import re
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
replaced = re.sub(r"\d{2}\/\d{2}\/\d{4}\s*", ",", s) # looses data
print replaced
g = re.match(r"(\d{2}\/\d{2}\/\d{4}\s*)(.*)", s)
if g:
# replaced = s.replace(group(0), "\n" + g.group(0)) # fails
# print replaced
Upvotes: 2
Views: 5046
Reputation: 626738
You may use a splitting approach if there is always whitespace between the dates:
\s+(?=\d+/\d+/\d+\s)
See the regex demo
Details:
\s+
- match 1+ whitespaces(?=\d+/\d+/\d+\s)
- that are followed with 1+ digits, and /
+ one or more digits twice (the date-like pattern), and then a whitespaceSee a Python demo below:
import re
rx = r"\s+(?=\d+/\d+/\d+\s)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.split(rx, s)
print(results)
Alternatively, a more complex regex can be used to actually match those dates:
\b\d+/\d+/\d+.*?(?=\s*\b\d+/\d+/\d+|$)
See the regex demo and a Python demo:
import re
rx = r"\b\d+/\d+/\d+.*?(?=\b\d+/\d+/\d+|$)"
s = "28/11/2016 Mushroom 05/12/2016 Carrot 12/12/2016 Broccoli 19/12/2016 Potato"
results = re.findall(rx, s)
print(results)
Here,
\b\d+/\d+/\d+
- matches a word boundary and a date-like pattern.*?
- any 0+ chars, as few as possible up to the first location that is followed with...(?=\s*\b\d+/\d+/\d+|$)
- 0+ whitespaces and a date-like pattern OR the end of string ($
).Upvotes: 1