Garrett Miller
Garrett Miller

Reputation: 119

Efficient way to compare elements in two lists?

I know this is similar to Efficient way to compare elements in 2 lists, but I have an extension on the question basically.

Say I have two lists:

a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]

I want to find out the indices of a where the element is equal to each element of b.

For instance, I would want the sample output for b[1] to tell me that a = b[1] at [0,3].

A data frame output would be useful as well, something like:

   b  index_a
   0     4
   1     0
   1     3
   2     1
   2     6
   3     5
   4     3

What I used before was:

b = pd.DataFrame(b)
a = pd.DataFrame(a)
pd.merge(b.reset_index(),a.reset_index(),
         left_on=b.columns.tolist(),
         right_on = a.columns.tolist(),
         suffixes = ('_b','_a'))['index_b','index_a']]

However, I am unsure if this is necessary since these are for lists. ( I used this method previously when I was working with dataframes ).

I am doing this operation thousands of times with much larger lists so I am wondering if there is a more efficient method.

In addition, b is just list(range(X)) where in this case X = 5

If anyone has some input I'd greatly appreciate it!

Thanks

Upvotes: 0

Views: 141

Answers (5)

DamienM
DamienM

Reputation: 9

Try this :

def listChecker(listA: list, listB: list) -> list:
"""
Checks for elements in `listA` that are not present in `listB`.

This function efficiently iterates through `listA` and adds elements not found in
`listB` to the `out_list`. It leverages set operations for both membership testing
and list creation, potentially enhancing performance in certain scenarios.

Args:
    listA (list): The first list to compare against.
    listB (list): The second list to check for common elements.

Returns:
    list: A new list containing elements from `listA` that are not present in `listB`.

Raises:
    TypeError: If either `listA` or `listB` is not of type `list`.

Examples:
    >>> listChecker([1, 2, 3, 4], [1, 3, 5])
    [2, 4]
    >>> listChecker([], [1, 2, 3])
    []
    >>> listChecker(["apple", "banana", "cherry"], ["apple", "orange"])
    ["banana", "cherry"]
"""

if not isinstance(listA, list):
    raise TypeError("listA must be a list")
if not isinstance(listB, list):
    raise TypeError("listB must be a list")

# Convert lists to sets for efficient membership testing and list creation
setA = set(listA)
setB = set(listB)

# Use set difference to efficiently find elements not in listB
missing_elements = setA.difference(setB)

return list(missing_elements)  # Convert the set back to a list

Have fun \m/

Upvotes: 0

Sam Bourne
Sam Bourne

Reputation: 650

I'm not sure if this is efficient enough for your needs, but this would work:

from collections import defaultdict

indexes = defaultdict(set)

a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]

for i, x in enumerate(a):
    indexes[x].add(i)

for x in b:
    print b, indexes.get(x)

Upvotes: 0

Bakuriu
Bakuriu

Reputation: 101909

A very simple and efficient solution is to build a mapping from the values in the range 0..N-1 to indices of a. The mapping can be a simple list, so you end up with:

indices = [[] for _ in b]
for i, x in enumerate(a):
    indices[x].append(i)

Example run:

>>> a = [1,2,4,1,0,3,2]
>>> b = [0,1,2,3,4]
>>> indices = [[] for _ in b]
>>> for i,x in enumerate(a):
...     indices[x].append(i)
... 
>>> indices[1]
[0, 3]

Note that b[i] == i so keeping the b list is pretty useless.

Upvotes: 2

Mark Ransom
Mark Ransom

Reputation: 308101

import collections
dd=collections.defaultdict(list)
for i,x in enumerate(a):
    dd[x].append(i)

>>> sorted(dd.items())
[(0, [4]), (1, [0, 3]), (2, [1, 6]), (3, [5]), (4, [2])]

Upvotes: 1

Matt Zhang
Matt Zhang

Reputation: 11

If b is sorted consecutive integers as you shown here, then bucket sort is most effective. Otherwise, you may construct a hash table, with value b as the key, and construction a list of a's as values.

Upvotes: 1

Related Questions