Reputation: 2158
I want to select entries matching a certain value out of a list of dicts in python 3. This should result in two lists: a new list with the selected entries and the modified original list without them.
Scenario
Assume we have a list of dicts:
import random, sys, time
letters_1 = []
colors = ["red", "orange", "yellow", "green", "blue", "purple"]
for i in range(100000):
letter = {"color": random.choice(colors), "content": random.randint(0, sys.maxsize)}
letters_1.append(letter)
letters_2 = list(letters_1)
We want to select all dicts with a certain value for a certain key, collect them into a new list and leave only the others in the initial list. This corresponds to how one would select all red-colored letters out of an actual stack of letters.
Possibilities
This can be done via list comprehension or via a for loop.
The problem with the list comprehension is that every one list comprehension only creates one list. I.e. in order to do what we want to do, we must go through the list twice: first copy the selected items into a new list, then remove the selected items the original list. To continue the script:
time_0 = time.time()
red_letters_1 = [letter for letter in letters_1 if letter["color"]=="red"]
letters_1 = [letter for letter in letters_1 if letter["color"]!="red"]
time_1 = time.time()
The problem with the for loop is that it leads to more convoluted code and that it (surprisingly) takes longer to execute:
time_2 = time.time()
red_letters_2 = []
other_letters_2 = []
for letter in letters_2:
if letter["color"] == "red":
red_letters_2.append(letter)
else:
other_letters_2.append(letter)
letters_2 = other_letters_2
time_3 = time.time()
print(time_1 - time_0)
print(time_3 - time_2)
Output:
0.011380434036254883
0.015761613845825195
Note: You can remove the need for having a second list other_letters_2
by going through the list backwards and using pop()
, but this takes even longer (more than 10 times longer, actually).
Question
While the possibility with two list comprehensions is clearly the fastest of these possibilities, it seems inefficient to do two list comprehensions. Is it possible to fold this into one list comprehension (without making it inefficient)? Is there another more efficient way? Or is there a reason why it is not possible to speed things up beyond the possibility with two list comprehensions?
Note on related questions
The question has been suggested to be a duplicate of this thread, where the question is about selecting two subsets of a list using list comprehension (or for loops). In this case, the only way is to test two different conditions, which may perhaps be shortened at the expense of some readability by applying a nested list comprehension as suggested in this answer.
Since (1) this solution to the suggested duplicate is not an option for the present question and (2) (as pointed out by Ev. Kounis ) the present question potentially allows for different solutions by modifying the original list in the list comprehension, I submit that this is not a duplicate (not an exact one in any case). I clarified this also in the beginning of the question.
Python version: 3.6.2
Upvotes: 2
Views: 108
Reputation: 15204
How about this: (i reduced the sample size to test but you can crank it back up)
import random, sys
letters_1 = []
colors = ["red", "orange", "purple"]
for i in range(10):
letter = {"color": random.choice(colors), "content": random.randint(0, sys.maxsize)}
letters_1.append(letter)
letters_1, letters_2 = [[x for x in letters_1 if x['color'] in i] for i in [('red', ), ("orange", "purple")]]
That is a single list-comprehension that takes advantage of variable unpacking.
Let me know how this performs in comparison. I am optimistic.
As you will also notice, the above code does not modify the original and create a new one but creates two new ones instead (overwrites the original)
Upvotes: 1