Aurora Eugene
Aurora Eugene

Reputation: 75

How to find oldest and newest file in a directory?

My code should find the newest and oldest files in a folder and its subfolders. It works for the top-level folder but it doesn't include files within subfolders.

import os
import glob

mypath = 'C:/RDS/*'

print(min(glob.glob(mypath), key=os.path.getmtime))
print(max(glob.glob(mypath), key=os.path.getmtime))

How do I make it recurse into the subfolders?

Upvotes: 0

Views: 2440

Answers (4)

martineau
martineau

Reputation: 123531

Here's a fairly efficient way of doing it. It determines the oldest and newest files by iterating through them all once. Since it uses iteration, there's no need to first create a list of them and go through it twice to determine the two extremes.

mport os
import pathlib

def max_min(iterable, keyfunc=None):
    if keyfunc is None:
        keyfunc = lambda x: x  # Identity.

    iterator = iter(iterable)
    most = least = next(iterator)
    mostkey = leastkey = keyfunc(most)

    for item in iterator:
        key = keyfunc(item)
        if key > mostkey:
            most = item
            mostkey = key
        elif key < leastkey:
            least = item
            leastkey = key
    return most, least


mypath = '.'
files = (f for f in pathlib.Path(mypath).resolve().glob('**/*') if f.is_file())
oldest, newest = max_min(files, keyfunc=os.path.getmtime)
print(f'oldest file: {oldest}')
print(f'newest file: {newest}')

Upvotes: 0

Marcel Preda
Marcel Preda

Reputation: 1205

Pay attention to the os filepath separator: "/" (on unix) vs. "\" (on windows). You can try something like below. It saves the files list in a variable, it is faster than traversing twice the file system. There is one line for debugging, comment it in production.

import os
import glob

mypath = 'D:\RDS\**'

allFilesAndFolders = glob.glob(mypath, recursive=True)
# just for debugging
print(allFilesAndFolders)

print(min(allFilesAndFolders, key=os.path.getmtime))
print(max(allFilesAndFolders, key=os.path.getmtime))

Upvotes: 0

Abhi_J
Abhi_J

Reputation: 2129

Try using pathlib, also getmtime gives the last modified time, you want the time file was created so use getctime

if you strictly want only files:

import os
import pathlib

mypath = 'your path'
taggedrootdir = pathlib.Path(mypath)
print(min([f for f in taggedrootdir.resolve().glob('**/*') if f.is_file()], key=os.path.getctime))
print(max([f for f in taggedrootdir.resolve().glob('**/*') if f.is_file()], key=os.path.getctime))

if results may include folders:

import os
import pathlib

mypath = 'your path'
taggedrootdir = pathlib.Path(mypath)
print(min(taggedrootdir.resolve().glob('**/*'), key=os.path.getctime))
print(max(taggedrootdir.resolve().glob('**/*'), key=os.path.getctime))

Upvotes: 1

Kraay89
Kraay89

Reputation: 957

As the docs show, you can add a recursive=True keyword argument to glob.glob()

so your code becomes:

import os
import glob

mypath = 'C:/RDS/*'

print(min(glob.glob(mypath, recursive=True), key=os.path.getmtime))
print(max(glob.glob(mypath, recursive=True), key=os.path.getmtime))

This should give you the oldest and newest file in your folder and all its subfolders.

Upvotes: 1

Related Questions