Reputation: 25
This is the string I am dealing with:'5Nov20217Dec202110Jan2022'
The string could also be:
'5Nov2021 7Dec2021 10Jan2022'
I would like to obtain a list like:
['5Nov2021','7Dec2021','10Jan2022']
I am currently using regex but to no avail:
re.findall('^\d{1,2}[a-zA-Z]{3}\d{4}$','5Nov20217Dec202110Jan2022')
A regex solution is not a must.
Upvotes: 2
Views: 55
Reputation: 71461
Based on the variability of your input, I suggest combining re
with string slicing in a while
loop:
def extract_dates(d):
while d:
if (k:=re.findall('^\d{1,2}[a-zA-Z]{3}\d{4}', d)):
if not (l:=d[len(k[0]):]) or l[0].isdigit():
yield k[0]
d = l
continue
if (k:=re.findall('^\d{1,2}[a-zA-Z]{3}\d{2}', d)):
yield k[0]
d = d[len(k[0]):]
else:
d = d[1:]
dates = ['5Nov20217Dec202110Jan2022', '5Nov217Dec2110Jan22', '5Nov21 7Dec21 10Jan22']
results = [list(extract_dates(i)) for i in dates]
Output:
[['5Nov2021', '7Dec2021', '10Jan2022'], ['5Nov21', '7Dec21', '10Jan22'], ['5Nov21', '7Dec21', '10Jan22']]
Upvotes: 4