Reputation: 47
Here is the list of filenames with timestamp in it. I need loop through the list and extract only the timestamp value in the list and strip the values and convert to timestamp.
s = ['Asbdnfe_20200404_000101.csv',
'sdndvd_20200404_010202.csv',
'vdfvdfvdfvd_20190303_030303.csv']
length = len(s)
for i in range(length):
match = re.search(r"_((\d+)_(\d+))", s[i])
print(match.group(1))
Result: 20200404_000001, 20200404_010202, 20190303_030303
But what I want is:
[2020-04-04 00:01:01.000,
2020-04-04 01:02:02.000,
2019-03-03 03:03:03.000]
Upvotes: 0
Views: 3556
Reputation: 565
Whenever you need to do the same thing to a bunch of similar inputs, look for a common pattern and start there. In this case, the pattern is pretty simple, so the regex is actually overkill.
import datetime as dt
from pathlib import Path
s = ['Asbdnfe_20200404_000101.csv',
'sdndvd_20200404_010202.csv',
'vdfvdfvdfvd_20190303_030303.csv']
datetimes = []
for filename in s:
name = Path(filename).stem # or os.path.splitext(filename)[0]
timestamp_str = name[-15:]
file_dt = dt.strptime(timestamp_str, '%Ym%d_%H%M%S')
datetimes.append(file_dt)
All your file names are in the form of <some_prefix>_<YYYYMMDD>_<HHMMSS>.csv
. So no matter what <some_prefix>
is, you can index the string from the right, and pull out the date and time information in the same way every time.
And as others have noted, once you do, the datetime
module's strptime
function exists exactly for this use.
Even if you have a case where the inputs aren't as clean and regular as the few file names you posted, just look for a slightly more abstract pattern and write code around that.
Upvotes: 1
Reputation: 73
You can use DateTime parsing and formating as follows
from datetime import datetime
import re
s = ['Asbdnfe_20200404_000101.csv',
'sdndvd_20200404_010202.csv',
'vdfvdfvdfvd_20190303_030303.csv']
length = len(s)
for i in range(length):
match = re.search(r"_((\d+)_(\d+))", s[i])
#print(match.group(1))
print(datetime.strptime(match.group(1), '%Y%m%d_%H%M%S').strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
You will get the output as
2020-04-04 00:01:01.000
2020-04-04 01:02:02.000
2019-03-03 03:03:03.000
Thanks,
Upvotes: 1
Reputation: 147146
You can use datetime.strptime
to convert the extracted strings into datetime
objects:
from datetime import datetime
import re
s = ['Asbdnfe_20200404_000101.csv','sdndvd_20200404_010202.csv','vdfvdfvdfvd_20190303_030303.csv']
for f in s:
match = re.search(r"_((\d+)_(\d+))", f)
d = datetime.strptime(match.group(1), '%Y%m%d_%H%M%S')
print(d)
Output:
2020-04-04 00:01:01
2020-04-04 01:02:02
2019-03-03 03:03:03
If you want to print the dates with milliseconds, use datetime.strftime
:
print(d.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
The %f
specifier prints microseconds, so we use [:-3]
to strip it back to a millisecond value.
To produce a list of results, just append them to a list rather than printing them:
d = []
for f in s:
match = re.search(r"_((\d+)_(\d+))", f)
dt = datetime.strptime(match.group(1), '%Y%m%d_%H%M%S')
d.append(dt.strftime('%Y-%m-%d %H:%M:%S.%f')[:-3])
print(d)
Or you can use a list comprehension:
d = [datetime.strptime(re.search(r"_((\d+)_(\d+))", f).group(1), '%Y%m%d_%H%M%S').strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] for f in s]
The output is the same:
['2020-04-04 00:01:01.000', '2020-04-04 01:02:02.000', '2019-03-03 03:03:03.000']
Upvotes: 5
Reputation: 1531
You can use datetime
import datetime import datetime
s = ['Asbdnfe_20200404_000101.csv',
'sdndvd_20200404_010202.csv',
'vdfvdfvdfvd_20190303_030303.csv']
length = len(s)
for i in range(length):
match = re.search(r"_((\d+)_(\d+))", s[i])
time_str = match.group(1)
print(datetime.strptime(time_str, "%Y%m%d_%H%M%S").strftime("%Y-%m-%d %H:%M:%S"))
Upvotes: 0