Reputation: 379
I am using the following code to search through a specific path:
path = glob.glob(C:/*/202*, recursive=True)
And I get the following list of strings as output:
C:/some_path\\20200102
C:/some_path\\20200131
C:/some_path\\20200228
C:/some_path\\20200310
C:/some_path\\20200331
The final folder means YYYYMMdd (Year-Month-day)
And I want to filter this list of strings in order to get only the biggest day (the day number) of the month in the list.
Output desired:
C:/some_path\\20200131
C:/some_path\\20200228
C:/some_path\\20200331
I tried doing:
for i in path:
filtered = max(path)
But this only retrieves the last date of the list. Not for each month as I want.
Upvotes: 0
Views: 51
Reputation: 559
from itertools import groupby
path = ['C:/some_path\\20200102',
'C:/some_path\\20200131',
'C:/some_path\\20200111',
'C:/some_path\\20200228',
'C:/some_path\\20200310',
'C:/some_path\\20200331']
path_by_month = groupby(path, key=lambda x: x[:-2])
for k, g in path_by_month:
g = sorted(g)
print(g[-1])
Upvotes: 1
Reputation: 297
You will first need to retrieve the date strings from the paths. It can be done by the following.
dates = [x.split('\\')[1] for x in path]
# Output: ['20200102', '20200131', '20200228', '20200310', '20200331']
Then you can iterate over the list and do what is needed. A sample code is attached below. Hope it helps.
path = ['C:/some_path\\20200102',
'C:/some_path\\20200131',
'C:/some_path\\20200228',
'C:/some_path\\20200310',
'C:/some_path\\20200331']
# Retrieve the dates. Assumption: it will be prefixed by \\
dates = [x.split('\\')[1] for x in path]
# Get the month indices, with two digits
month_indices = [str(x).zfill(2) for x in range(1,13)]
# month_indices will be set to ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
for month in month_indices:
month_dates = [date for date in dates if date[4:6] == month]
month_days = [int(date[-2:]) for date in month_dates]
if len(month_dates) > 0:
print(month, max(month_days))
Upvotes: 0
Reputation: 2804
Try it:
names = ["202001", "202002", "202003","202004","202005","202006","202007",
"202008", "202009","202010","202011","202012"]
path = []
for name in names:
for j in range(31,0,-1):
tmp = glob.glob(f"C:/*/{name}{j}", recursive=True)
if tmp:
path.append(tmp)
break
Upvotes: 0
Reputation: 3686
Probably not the most efficient but you could do so like this. Basically a groupby
and max
using a dict
.
paths = ['C:/some_path\\20200102',
'C:/some_path\\20200131',
'C:/some_path\\20200228',
'C:/some_path\\20200310',
'C:/some_path\\20200331']
d = {}
for path in paths:
d[path[-4:-2]] = d.get(path[-4:-2], []) + [path]
print([max(paths, key=lambda x: x[-2:]) for paths in d.values()])
['C:/some_path\\20200131', 'C:/some_path\\20200228', 'C:/some_path\\20200331']
Upvotes: 0