o1n3n21
o1n3n21

Reputation: 364

Checking if files have the same date from the name, Python 3

I have some files that have the date saved within them, e.g. foo161108part.txt,baarr161108part2.txt,python141106part2.txt

So far I've listed the directory with:

directoryFiles = []
for name in os.listdir(os.getcwd()):
    if name.endswith('.txt'):
        files.append(name)
print(files)

There are quite a lot of different files with different dates, and I'd like to see how many come up on the same date.

Thanks!

Upvotes: 1

Views: 592

Answers (3)

RomanPerekhrest
RomanPerekhrest

Reputation: 92904

If date part is a crucial part to search within a file name, consider the following approach:

import re

counts = {}
pattern = re.compile(r'^.*(\d{6}).*?$')

for f in os.listdir('text_files'):
    m = re.match(pattern, f)
    if m:
        date_value = m.group(1)
        counts[date_value] = counts[date_value]+1 if counts.get(date_value) else 1

print(counts)

The output:

{'161108': 2, '141106': 1}

As for regex:

using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program

Upvotes: 1

Maximilian Matthé
Maximilian Matthé

Reputation: 338

You can use the regex and Counter class of python for this purpose:

import re
from collections import Counter

files = ['foo161108part.txt','baarr161108part2.txt','python141106part2.txt']

dates = []
for f in files:
    m = re.match(r"^.*(\d{6}).*\.txt$", f)
    if m:
        dates.append(m.group(1))
print dates
print Counter(dates)

Output:

['161108', '161108', '141106']
Counter({'161108': 2, '141106': 1})

Upvotes: 0

Moinuddin Quadri
Moinuddin Quadri

Reputation: 48120

If the aim is to just compare the content of the files, the ideal way will be to use filecmp module. This modules provides filecmp.cmp() method which:

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.

Example:

>>> import filecmp
>>> filecmp.cmp('undoc.rst', 'undoc.rst') 
True
>>> filecmp.cmp('undoc.rst', 'index.rst') 
False

Upvotes: 0

Related Questions