cjg123
cjg123

Reputation: 473

How can I remove all strings that fit certain format from a list?

Question: Say I have a list a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']

How can I go about removing elements such as 4:45 AM and 6:31 PM and '2:36'? i.e, how can I remove elements of the form number:number|number and those with AM/PM on the end?

To be honest, I havent tried much, as I am not sure really where to even begin, other than something like:

[x for x in a if x != something]

Upvotes: 7

Views: 2249

Answers (7)

chandima
chandima

Reputation: 141

Check this implementation.

import re

a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
regex = re.compile(r'^[0-2]{0,1}[0-9]\:[0-5][0-9]\s{0,1}([AP][M]){0,1}')

a  = [x for x in a if not regex.match(x)]
print(a)

OUTPUT

['abd', ' the dog', '1234 total', 'etc...']

Upvotes: 2

user10941319
user10941319

Reputation:

Try this code in pure Python. Firstly it checks the last two chars, if the last two chars equals to 'am' or 'pm', element should be removed from list. Secondly it checks each element if it contains ':', if ':' is found in the element, then it checks the characters before and after ':'. If characters before and after ':' are digits, the element is removed from list. The idea supports number|number:number and number:number|number.

def removeElements(a):
    removed_elements = []
    L = len(a)
    for i in range(L):
        element = a[i]
        if 'am' == element[-2:].lower() or 'pm' ==element[-2:].lower() :
            removed_elements.append(element)
        if ':' in element:
            part1 = element.split(':')
            part2 = element.split(':')
            if part1[-1].isdigit() and part2[0].isdigit():
                removed_elements.append(element)
    output =  []
    for element in a:
        if not(element in removed_elements):
            output.append(element)
    return output

a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
output = removeElements(a)
print output

output for this example is : ['abd', ' the dog', '1234 total', 'etc...']

Upvotes: 3

U13-Forward
U13-Forward

Reputation: 71610

You don't need regex, try using:

>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
>>> [i for i in a if ':' not in i and not i[-2:] in ['AM','PM']]
['abd', ' the dog', '1234 total', 'etc...']
>>> 

Or use a much easier solution with regex:

>>> import re
>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
>>> [i for i in a if not re.search('\d+:\d+',i)]
['abd', ' the dog', '1234 total', 'etc...']
>>> 

Or a version of non-regex that's also much easier:

>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']
>>> [i for i in a if ':' not in i]
['abd', ' the dog', '1234 total', 'etc...']
>>> 

Upvotes: 1

TehVulpes
TehVulpes

Reputation: 51

Consider using the built-in filter function with a compiled regex.

>>> import re
>>> no_times = re.compile(r'^(?!\d\d?:\d\d(\s*[AP]M)?$).*$')
>>> a = ['abd', ' the dog', '4:45 AM', '1234 total', 'etc...','6:31 PM', '2:36']

>>> filter(no_times.match, a)
['abd', ' the dog', '1234 total', 'etc...']

A lambda can also be used for the first argument if, for example, you wanted to avoid compiling a regex, though it is messier.

>>> filter(lambda s: not re.match(r'^\d\d?:\d\d(\s*[AP]M)?$', s), a)
['abd', ' the dog', '1234 total', 'etc...']

Note that in Python 3, filter returns an iterable object instead of a list.


The regular expression here works by accepting all strings except \d\d?:\d\d(\s*[AP]M)?$. This means all strings except for ones matching HH:MM, optionally ending in some whitespace followed by AM or PM.

Upvotes: 3

dawg
dawg

Reputation: 104102

A regex is the easy answer.

Here is an alternative with pure Python:

>>> a = ['abd', ' the dog', '4:45', '1234 total', 'etc...','6:31', '1234']
>>> [s for s in a if not all(e.isdigit() for e in s.split(':'))]
['abd', ' the dog', '1234 total', 'etc...']

Note that there is a side effect of '1234'.split(':') that serves to filter all digits as well.


If there is a possibility of '1:2:3' type numbers:

>>> a = ['abd', ' the dog', '4:45', '1234 total', 'etc...','6:31', '1234', '1:2:3']
>>> [s for s in a if len(s.split(':'))<=2 and not all(e.isdigit() for e in s.split(':'))]
['abd', ' the dog', '1234 total', 'etc...']

Upvotes: 2

vks
vks

Reputation: 67988

You can use regular expression \d+(?::\d+)?$ and filter using it.

See demo.

https://regex101.com/r/HoGZYh/1

import re
a = ['abd', ' the dog', '4:45', '1234 total', '123', '6:31']
print [i for i in a if not re.match(r"\d+(?::\d+)?$", i)]

Output: ['abd', ' the dog', '1234 total']

Upvotes: 11

timgeb
timgeb

Reputation: 78790

The regular expression \d:\d\d$ matches a single digit, then a :, followed by two digits.

>>> import re
>>> a = ['abd', ' the dog', '4:45', '1234 total', 'etc...', '6:31']
>>> regex = re.compile('\d:\d\d$')
>>> [s for s in a if regex.match(s)]
['4:45', '6:31']
>>> [s for s in a if not regex.match(s)]
['abd', ' the dog', '1234 total', 'etc...']

\d+:\d+$ would match any number n >= 1 of digits on each side of the :. I suggest you play around with it. The documentation is here.

Detail: $ specifies the end of the string, and re.match starts looking at the start of the string.

Upvotes: 2

Related Questions