caasswa
caasswa

Reputation: 521

How to filter out strings from a list of strings with dates?

How to filter out this list, so that we are left with only a list of strings that are in yyyy-mm-dd format?

2021-11-11
2021-10-01
some_folder
some_other_folder

so that we end up with a list like so:

2021-11-11
2021-10-01

Also what if the list has a prefix?

root/2021-11-11
root/2021-10-01
user/some_folder
root/some_other_folder

and we wanted to end up with:

root/2021-11-11
root/2021-10-01

Upvotes: 2

Views: 636

Answers (3)

S.B
S.B

Reputation: 16496

I would let datetime module handle that for me using strptime. If it is not in '%Y-%m-%d' format, it raises ValueError :

import datetime

lst = ['2021-11-11', '2021-10-01', 'some_folder', 'some_other_folder',
       'root/2021-11-11', 'root/2021-10-01',
       'user/some_folder', 'root/some_other_folder']


def filter_(s):
    last_part = s.rsplit('/', maxsplit=1)[-1]
    try:
        datetime.datetime.strptime(last_part, '%Y-%m-%d')
        return True
    except ValueError:
        return False


print([i for i in lst if filter_(i)])

output :

['2021-11-11', '2021-10-01', 'root/2021-11-11', 'root/2021-10-01']

Upvotes: 5

user11717481
user11717481

Reputation: 1602

>>> import re
>>> 
>>> filter_pattern = re.compile(r'.*\d{4}-\d{2}-\d{2}$')
>>> 
>>> lst = [
... '2021-11-11', '2021-10-01', 'some_folder', 
... 'some_other_folder', 'root/2021-11-11', 'root/2021-10-01',
... 'user/some_folder', 'root/some_other_folder'
... ]
>>> 
>>> lst = [i for i in lst if (len(filter_pattern.findall(i) > 0)]
>>> 
>>> lst

Upvotes: 1

Nathan Roberts
Nathan Roberts

Reputation: 838

You can use the re library for this. Something like this.

Edit: Changed my answer because of @SorousHBakhtiary's comment about an exception I forgot that happens when you modify an iterable object while iterating it.

import re

li = [
'root/2021-11-11',
'root/2021-10-01',
'user/some_folder',
'root/some_other_folder',
]

new_list = li.copy()

for string in new_list:
   if not re.fullmatch('.*\d{4}-\d{2}-\d{2}$',string):
      li.remove(string)

This can also be done in one line using list comprehension:

li = [
'root/2021-11-11',
'root/2021-10-01',
'user/some_folder',
'root/some_other_folder',
]

li = [string for string in li if re.fullmatch('.*\d{4}-\d{2}-\d{2}$',string)]

Upvotes: 2

Related Questions