Reputation: 21
I am trying to reduce duplicates in a SQL Server database table and the criteria are complicated. All table data has been pulled into Python list
s.
In this step, I want to use a list comprehension to find a certain field value when one list
value matches the list
value of the list
that has duplicates in rows.
list
A is a unique list of dupID
s: [134L, 1610L, 1861L, 2026L, 3211L, 4134L, 4363L, 4453L, 4733L,...]
list
B is two-dimensional:
Row# dupID nameID SSN personID
[[85097L, 236479L, 241583, '999-99-0000', 359913, datetime.datetime(2012, 9, 9, 0, 0)]
[78654L, 236479L, 996783, '999-99-0000', NULL, datetime.datetime(2008, 5, 4, 0, 0)]...]
These are the loops that I want to speed up through a list comprehension:
personIDList = []
for i in range(len(A)):
for j in range(len(B)):
if A[i] == B[j][1]: # if dupID == dupID
personIDList.append(B[j][4]) # append personID
Upvotes: 1
Views: 837
Reputation: 113998
import numpy as np
A = np.array(A)
B = np.array(B)
person_ids = B[np.in1d(list(B[:,1]),A)][:,4]
I think at least ... it would be easier if you posted example A and B lists
I always like doing numpy stuff :P
we can make this more readable by breaking it apart
dup_ids_in_b = list(B[:,1]) # take column 1 from B (we use list so its not of type `object`)
boolmask_b_dups_in_a = np.in1d(dup_ids_in_b,A) # True,True,False,... True for all indices where B[i][1] is in A
person_ids = B[boolmask_b_dups_in_a][:,4] # take the fourth column of all the True indices from last step
Upvotes: 0
Reputation: 11060
Here is how you turn a single for loop into a list comprehension:
my_list = []
for i in something:
my_list.append(i+7)
goes to
my_list = [i+7 for i in something]
Here is how you turn nested for loops into a list comprehension:
my_list = []
for i in first_thing:
for j in second_thing:
my_list.append(i + j)
goes to
my_list = [i + j for i in first_thing for j in second_thing]
So in your case you want to do this:
personIDList = [b[4] for a in A for b in B if a == b[1]]
Upvotes: 0
Reputation: 250991
Convert the unique ID list to a set first:
s = set(A)
Then iterate over the other list using a list comprehension:
personIDList = [item[4] for item in B if item[1] in s]
This is going to be O(N)
in complexity compared to your approach which is O(N**2)
.
Upvotes: 3
Reputation: 387825
First of all, you would iterate over the elements not the indexes, so you would get this:
personIDList = []
for a in A:
for b in B:
if a == b[1]:
personIDList.append(b[4])
This can be then easily turned into a list comprehension:
personIDList = [b[4] for a in A for b in B if a == b[1]]
Upvotes: 1