Reputation: 364
I have some files that have the date saved within them, e.g. foo161108part.txt
,baarr161108part2.txt
,python141106part2.txt
So far I've listed the directory with:
directoryFiles = []
for name in os.listdir(os.getcwd()):
if name.endswith('.txt'):
files.append(name)
print(files)
There are quite a lot of different files with different dates, and I'd like to see how many come up on the same date.
Thanks!
Upvotes: 1
Views: 592
Reputation: 92904
If date part is a crucial part to search within a file name, consider the following approach:
import re
counts = {}
pattern = re.compile(r'^.*(\d{6}).*?$')
for f in os.listdir('text_files'):
m = re.match(pattern, f)
if m:
date_value = m.group(1)
counts[date_value] = counts[date_value]+1 if counts.get(date_value) else 1
print(counts)
The output:
{'161108': 2, '141106': 1}
As for regex:
using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program
Upvotes: 1
Reputation: 338
You can use the regex and Counter class of python for this purpose:
import re
from collections import Counter
files = ['foo161108part.txt','baarr161108part2.txt','python141106part2.txt']
dates = []
for f in files:
m = re.match(r"^.*(\d{6}).*\.txt$", f)
if m:
dates.append(m.group(1))
print dates
print Counter(dates)
Output:
['161108', '161108', '141106']
Counter({'161108': 2, '141106': 1})
Upvotes: 0
Reputation: 48120
If the aim is to just compare the content of the files, the ideal way will be to use filecmp
module. This modules provides filecmp.cmp()
method which:
Compare the files named f1 and f2, returning True if they seem equal, False otherwise.
Example:
>>> import filecmp
>>> filecmp.cmp('undoc.rst', 'undoc.rst')
True
>>> filecmp.cmp('undoc.rst', 'index.rst')
False
Upvotes: 0