Arthur
Arthur

Reputation: 89

How can i remove a tuple from a list when there's a None value in the tuple

I'm working on a async code that make thousand of requests. Each request is saved in a tuple with id and response and then appended to the tasks list.

Normally i end up with a list of 4000+ tuples

After running the code I get a list like this:

responses = [(00001, {"code": 0, "foo": "bar"}), (00002, {"code": 0, "foo": "bar"}), (00003, {"code": 0, "foo": "bar"}), (00004, None), (00005, None), (00006, {"code": 0, "foo": "bar"})]

As i only need the ones with the json response, i want to delete all tuples where the second index is None

I've done a interaction in the list to append to a new list only the "valids" tuple, the ones with no None value, but it's not so performatic.

Is there a way I can delete these tuples with None without having to interact one by one?

Upvotes: 1

Views: 741

Answers (3)

ofey404
ofey404

Reputation: 46

TL;DR: List comprehension performs best.

Then builtin filter, and multiprocessing.pool is an overkill, ie the worst.

I tested all of them in my machine, python 3.10.2, output:

$ python main.py 
  288.01 mks in filter_LC([(1, {'code': 0, 'foo'...)  # List comprehension
  469.21 mks in filter_builtin([(1, {'code': 0, 'foo'...) # Builtin filter
   15.28 ms in filter_pool([(1, {'code': 0, 'foo'...) # Multiprocessing

Test code:

from multiprocessing import Pool  # use process
# from multiprocessing.dummy import Pool  # thread based Pool performs better than process but only slightly
from funcy import print_durations

responses = [(1, {"code": 0, "foo": "bar"}),
             (2, None),
             (3, {"code": 0, "foo": "bar"}),
             (4, None),
             (5, None),
             (6, {"code": 0, "foo": "bar"})] * 1000

@print_durations
def filter_LC(responses):
    return [c for c in responses if c[1] != None]

@print_durations
def filter_builtin(responses):
    return list(filter(lambda c: c[1] != None, responses))

# Helpers for filter_pool()

def valid(x):
    if len(x) < 2 or x[1] == None:
        return False
    return True

def pool_filter(pool, func, candidates):
    return [c for c, keep in zip(candidates, pool.map(func, candidates)) if keep]

@print_durations
def filter_pool(responses, pool_size=5):
    with Pool(pool_size) as p:
        return pool_filter(p, valid, responses)

if __name__ == "__main__":
    ans = [
        filter_LC(responses),
        filter_builtin(responses),
        filter_pool(responses),
    ]
    for a in ans:
        assert a == ans[0]

List comprehension beats the builtin filter. I guess filter may suffer from the overhead of lambda, which list comprehension don't have.

And thread/process pool is an overkill, better save it for more time-consuming jobs rather than filtering ;)

Reference:

pool_filter() snippet comes from How to use parallel processing filter in Python? - Stackoverflow

Upvotes: 3

Lucas Azevedo
Lucas Azevedo

Reputation: 2370

The idea behind the filter function is great and very pythonic. It only lacks the performance as it doesn't support multiprocessing because lambdas are not pickleable by default.

The map reduce are other alternatives, see reference here. An example of solution is:

def validate_request(request):
    return True if request[1] is not None

requests = [r for r, valid in zip(requests, pool.map(validate_request, requests)) if valid]

Upvotes: 1

jjramsey
jjramsey

Reputation: 1171

You might try using Python's filter(). For example, you could do this:

valids = filter(lambda x: x[1] is not None, responses)

where for your sample responses variable, valids would be

[(1, {'code': 0, 'foo': 'bar'}), 
 (2, {'code': 0, 'foo': 'bar'}), 
 (3, {'code': 0, 'foo': 'bar'}), 
 (6, {'code': 0, 'foo': 'bar'})]

BTW, in Python 3, leading zeros on decimal integer literals are not allowed. So this code is for Python 2.x.

Now whether this would be more performant than a list comprehension, I can't say for sure, though one blog post indicates that it may not be. Under the hood, Python is probably still interacting with the tuples one by one, but at least that's not reflected in the semantics of the code.

Upvotes: 1

Related Questions