Reputation: 23
I have a lists of tuples (>150k elements) containing an ID with the length of that ID. However, this list should only show IDs that appear in the second list (>80k).
list_of_tuples = [('1', 31.46), ('10', 97.99), ('50', 71.19), ('100', 17.03), ...]
normal_list = ['1', '50', '100', ...]
The desired output is:
list_of_tuples = [('1', 31.46), ('50', 71.19), ('100', 17.03), ...]
Here's the code that I threw together for testing the concept, but as I am new to Python, it doesn't work. I also haven't found a solution online for this kind of issue.
for whole_elem in list_of_tuples:
for first_elem in whole_elem:
for link in normal_list:
if first_elem <> link
list_of_tuples.pop(whole_elem)
I would appreciate your support a lot. Thank you very much!
Upvotes: 1
Views: 696
Reputation: 95242
Does a list comprehension not work here?
filtered_tuples = [tpl for tpl in list_of_tuples if tpl[0] in normal_list]
Since your list is so big, you probably want to turn the normal_list
into a set to improve performance:
normal_set = frozenset(normal_list)
filtered_tuples = [tpl for tpl in list_of_tuples if tpl[0] in normal_set]
Upvotes: 3
Reputation: 16660
You can probably solve this conceptually at the same level as asked ("inner join") with pandas
join
function (look at this question).
However, here I would just do the following:
result = []
normal_set = set(normal_list) # improves performance of 'contains' check from len(normal_list)/2 (on avarage) to log(len(normal_list)
for tpl in list_of_tuples:
if tpl[0] in normal_set:
result.append(tpl)
On top of that, you can yield
elements of the result instead of append
if you do not need to consume the result as a whole:
def inner_join(normal_list, list_of_tuples):
result = []
normal_set = set(normal_list)
for tpl in list_of_tuples:
if tpl[0] in normal_set:
yield tpl
Which you use as:
for el in innner_join(normal_list, list_of_tuples):
....
Upvotes: 1
Reputation: 224886
for first_elem in whole_elem:
The first element in whole_elem
is whole_elem[0]
. for first_elem in whole_elem
would be a loop over all of its elements.
first_elem = whole_elem[0]
for link in normal_list: if first_elem <> link
This runs the body of the if
every time there’s an element in normal_list
that isn’t equal to first_elem
. Instead, you want to check whether normal_list
contains first_elem
.
if first_elem not in normal_list:
dict_link_length.pop(whole_elem)
Not sure what dict_link_length
is. Producing a new list is probably a better idea than modifying the old one, though.
filtered = []
for whole_elem in list_of_tuples:
first_elem = whole_elem[0]
if first_elem in normal_list:
filtered.append(whole_elem)
Now pick more descriptive names for things (a list of tuples is a normal list). Here’s a start:
select_ids = ['1', '50', '100', ...]
filtered = []
for item in items:
item_id = item[0]
if item_id in select_ids:
filtered.append(item)
Checking if an element is in a list with in
is slow because it has to check the entire list one at a time, but checking if an element is in a set is constant-time:
select_ids = ['1', '50', '100', ...]
filtered = []
select_ids = frozenset(select_ids)
for item in items:
item_id = item[0]
if item_id in select_ids:
filtered.append(item)
Finally, this can be written as a list comprehension:
select_ids = frozenset(select_ids)
filtered = [item for item in items if item[0] in select_ids]
Upvotes: 2