Reputation: 3182
I have a directory full of files that have date strings as part of the filenames:
file_type_1_20140722_foo.txt
file_type_two_20140723_bar.txt
filetypethree20140724qux.txt
I need to get these date strings from the filenames and save them in an array:
['20140722', '20140723', '20140724']
But they can appear at various places in the filename, so I can't just use substring notation and extract it directly. In the past, the way I've done something similar to this in Bash is like so:
date=$(echo $file | egrep -o '[[:digit:]]{8}' | head -n1)
But I can't use Bash for this because it sucks at math (I need to be able to add and subtract floating point numbers). I've tried glob.glob()
and re.match()
, but both return empty sets:
>>> dates = [file for file in sorted(os.listdir('.')) if re.match("[0-9]{8}", file)]
>>> print dates
>>> []
I know the problem is it's looking for complete file names that are eight digits long, but I have no idea how to make it look for substrings instead. Any ideas?
Upvotes: 1
Views: 14536
Reputation: 881037
>>> import re
>>> import os
>>> [date for file in os.listdir('.') for date in re.findall("(\d{8})", file)]
['20140722', '20140723']
Note that if a filename has a 9-digit substring, then only the first 8 digits will be matched. If a filename contains a 16-digit substring, there will be 2 non-overlapping matches.
Upvotes: 6
Reputation: 42788
re.match
matches from the beginning of the string. re.search
matches the pattern anywhere.
Or you can try this:
extract_dates = re.compile("[0-9]{8}").findall
dates = [dates[0] for dates in sorted(
extract_dates(filename) for filename in os.listdir('.')) if dates]
Upvotes: 1
Reputation: 3196
Your regular expression looks good, but you should be using re.search instead of re.match so that it will search for that expression anywhere in the string:
import re
r = re.compile("[0-9]{8}")
m = r.search(filename)
if m:
print m.group(0)
Upvotes: 2