Reputation: 119
I know this is similar to Efficient way to compare elements in 2 lists, but I have an extension on the question basically.
Say I have two lists:
a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]
I want to find out the indices of a
where the element is equal to each element of b
.
For instance, I would want the sample output for b[1]
to tell me that a = b[1]
at [0,3]
.
A data frame output would be useful as well, something like:
b index_a
0 4
1 0
1 3
2 1
2 6
3 5
4 3
What I used before was:
b = pd.DataFrame(b)
a = pd.DataFrame(a)
pd.merge(b.reset_index(),a.reset_index(),
left_on=b.columns.tolist(),
right_on = a.columns.tolist(),
suffixes = ('_b','_a'))['index_b','index_a']]
However, I am unsure if this is necessary since these are for lists. ( I used this method previously when I was working with dataframes ).
I am doing this operation thousands of times with much larger lists so I am wondering if there is a more efficient method.
In addition, b is just list(range(X))
where in this case X = 5
If anyone has some input I'd greatly appreciate it!
Thanks
Upvotes: 0
Views: 141
Reputation: 9
Try this :
def listChecker(listA: list, listB: list) -> list:
"""
Checks for elements in `listA` that are not present in `listB`.
This function efficiently iterates through `listA` and adds elements not found in
`listB` to the `out_list`. It leverages set operations for both membership testing
and list creation, potentially enhancing performance in certain scenarios.
Args:
listA (list): The first list to compare against.
listB (list): The second list to check for common elements.
Returns:
list: A new list containing elements from `listA` that are not present in `listB`.
Raises:
TypeError: If either `listA` or `listB` is not of type `list`.
Examples:
>>> listChecker([1, 2, 3, 4], [1, 3, 5])
[2, 4]
>>> listChecker([], [1, 2, 3])
[]
>>> listChecker(["apple", "banana", "cherry"], ["apple", "orange"])
["banana", "cherry"]
"""
if not isinstance(listA, list):
raise TypeError("listA must be a list")
if not isinstance(listB, list):
raise TypeError("listB must be a list")
# Convert lists to sets for efficient membership testing and list creation
setA = set(listA)
setB = set(listB)
# Use set difference to efficiently find elements not in listB
missing_elements = setA.difference(setB)
return list(missing_elements) # Convert the set back to a list
Have fun \m/
Upvotes: 0
Reputation: 650
I'm not sure if this is efficient enough for your needs, but this would work:
from collections import defaultdict
indexes = defaultdict(set)
a = [1,2,4,1,0,3,2]
b = [0,1,2,3,4]
for i, x in enumerate(a):
indexes[x].add(i)
for x in b:
print b, indexes.get(x)
Upvotes: 0
Reputation: 101909
A very simple and efficient solution is to build a mapping from the values in the range 0..N-1
to indices of a
. The mapping can be a simple list, so you end up with:
indices = [[] for _ in b]
for i, x in enumerate(a):
indices[x].append(i)
Example run:
>>> a = [1,2,4,1,0,3,2]
>>> b = [0,1,2,3,4]
>>> indices = [[] for _ in b]
>>> for i,x in enumerate(a):
... indices[x].append(i)
...
>>> indices[1]
[0, 3]
Note that b[i] == i
so keeping the b
list is pretty useless.
Upvotes: 2
Reputation: 308101
import collections
dd=collections.defaultdict(list)
for i,x in enumerate(a):
dd[x].append(i)
>>> sorted(dd.items())
[(0, [4]), (1, [0, 3]), (2, [1, 6]), (3, [5]), (4, [2])]
Upvotes: 1
Reputation: 11
If b is sorted consecutive integers as you shown here, then bucket sort is most effective. Otherwise, you may construct a hash table, with value b as the key, and construction a list of a's as values.
Upvotes: 1