Reputation: 993
I have a list that looks like this:
list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
and i just want the dates. I have a regex that looks like this :
r'\b(\d+/\d+/\d{4})\b'
but i don´t really know how to use it in a list. Or maybe can be done in other way
Any help will be really appreciated
Upvotes: 1
Views: 90
Reputation: 48067
You can achieve this by using re.match().
Note: list is reserved keyword in Python. You should not use that.
import re
str_list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
# Using list(str_list) to iterate over the copy of 'str_list'
# to remove unmatched strings from the original list
for s in list(str_list):
if not re.match(r'\b(\d+/\d+/\d{4})\b', s):
str_list.remove(s)
OR, you may use list comprehension if you also want to keep original list:
import re
str_list = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
new_list = [s for s in str_list if re.match(r'\b(\d+/\d+/\d{4})\b', s)]
Upvotes: 1
Reputation: 8840
If the list is long, compile the pattern first will result in better performance
import re
# list is a keyword in Python, so when used as a variable name, append
# underscore, according to PEP8 (https://www.python.org/dev/peps/pep-0008/)
# quote: single_trailing_underscore_ : used by convention to avoid conflicts
# with Python keyword, e.g.
list_ = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
date_pattern = re.compile(r'\b(\d+/\d+/\d{4})\b')
print filter(date_pattern.match, list_)
# equivalent to
# print [i for i in list_ if date_pattern.match(i)]
# produces ['1/4/2015', '1/4/2015', '1/4/2015']
Upvotes: 3
Reputation: 8127
Very simple. Just use re.match
:
>>> import re
>>> mylist = ['Julio Cesar por inhumana (?)', '1/4/2015', '1/4/2015', '1/4/2015']
>>> dates = [x for x in mylist if re.match(r'\b(\d+/\d+/\d{4})\b', x)]
>>> dates
['1/4/2015', '1/4/2015', '1/4/2015']
re.match
only matches at the start of the string, so it's what you want for this case. Also, I wouldn't name a list "list" -- because that's the name of the built-in list class, you could hurt yourself later if you try to do list(some_iterable)
. Best not to get in that habit.
Finally, your regex will match a string that starts with a date. If you want to insure that the entire string is your date, you could modify it slightly to r'(\d{1,2}/\d{1,2}/\d{4})$'
-- this will insure that the month and day are each 1 or 2 digits and the year is exactly 4 digits.
Upvotes: 6