eduardo2111
eduardo2111

Reputation: 379

Filter a list in order to get the last day (in the list) for each month

I am using the following code to search through a specific path:

path = glob.glob(C:/*/202*, recursive=True)

And I get the following list of strings as output:

C:/some_path\\20200102
C:/some_path\\20200131
C:/some_path\\20200228
C:/some_path\\20200310
C:/some_path\\20200331

The final folder means YYYYMMdd (Year-Month-day)

And I want to filter this list of strings in order to get only the biggest day (the day number) of the month in the list.

Output desired:

C:/some_path\\20200131
C:/some_path\\20200228
C:/some_path\\20200331

I tried doing:

for i in path:
    filtered = max(path)

But this only retrieves the last date of the list. Not for each month as I want.

Upvotes: 0

Views: 51

Answers (4)

정도유
정도유

Reputation: 559

  1. group by month
  2. sort each group
  3. print last one
from itertools import groupby

path = ['C:/some_path\\20200102',
'C:/some_path\\20200131',
'C:/some_path\\20200111',
'C:/some_path\\20200228',
'C:/some_path\\20200310',
'C:/some_path\\20200331']

path_by_month = groupby(path, key=lambda x: x[:-2])
for k, g in path_by_month:
    g = sorted(g)
    print(g[-1])

Upvotes: 1

Muntasir Wahed
Muntasir Wahed

Reputation: 297

You will first need to retrieve the date strings from the paths. It can be done by the following.

dates = [x.split('\\')[1] for x in path]
# Output: ['20200102', '20200131', '20200228', '20200310', '20200331']

Then you can iterate over the list and do what is needed. A sample code is attached below. Hope it helps.

path = ['C:/some_path\\20200102',
'C:/some_path\\20200131',
'C:/some_path\\20200228',
'C:/some_path\\20200310',
'C:/some_path\\20200331']

# Retrieve the dates. Assumption: it will be prefixed by \\
dates = [x.split('\\')[1] for x in path]
# Get the month indices, with two digits
month_indices = [str(x).zfill(2) for x in range(1,13)]
# month_indices will be set to ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']

for month in month_indices:
    month_dates = [date for date in dates if date[4:6] == month]
    month_days = [int(date[-2:]) for date in month_dates]
    if len(month_dates) > 0:
        print(month, max(month_days))

Upvotes: 0

dimay
dimay

Reputation: 2804

Try it:

names = ["202001", "202002", "202003","202004","202005","202006","202007",
         "202008", "202009","202010","202011","202012"]
path = []
for name in names:
    for j in range(31,0,-1):
        tmp = glob.glob(f"C:/*/{name}{j}", recursive=True)
        if tmp:
            path.append(tmp)
            break

Upvotes: 0

ScootCork
ScootCork

Reputation: 3686

Probably not the most efficient but you could do so like this. Basically a groupby and max using a dict.

paths = ['C:/some_path\\20200102',
'C:/some_path\\20200131',
'C:/some_path\\20200228',
'C:/some_path\\20200310',
'C:/some_path\\20200331']

d = {}
for path in paths: 
    d[path[-4:-2]] = d.get(path[-4:-2], []) + [path]

print([max(paths, key=lambda x: x[-2:]) for paths in d.values()])

['C:/some_path\\20200131', 'C:/some_path\\20200228', 'C:/some_path\\20200331']

Upvotes: 0

Related Questions