Albert
Albert

Reputation: 21

Python List Comprehension for Database Data

I am trying to reduce duplicates in a SQL Server database table and the criteria are complicated. All table data has been pulled into Python lists.

In this step, I want to use a list comprehension to find a certain field value when one list value matches the list value of the list that has duplicates in rows.

list A is a unique list of dupIDs: [134L, 1610L, 1861L, 2026L, 3211L, 4134L, 4363L, 4453L, 4733L,...]

list B is two-dimensional:

  Row#    dupID    nameID   SSN           personID

[[85097L, 236479L, 241583, '999-99-0000', 359913, datetime.datetime(2012, 9, 9, 0, 0)]

 [78654L, 236479L, 996783, '999-99-0000', NULL,   datetime.datetime(2008, 5, 4, 0, 0)]...]

These are the loops that I want to speed up through a list comprehension:

personIDList = []
for i in range(len(A)):
        for j in range(len(B)):
             if A[i] == B[j][1]: # if dupID == dupID
                   personIDList.append(B[j][4]) # append personID

Upvotes: 1

Views: 837

Answers (4)

Joran Beasley
Joran Beasley

Reputation: 113998

import numpy as np
A = np.array(A)
B = np.array(B)
person_ids = B[np.in1d(list(B[:,1]),A)][:,4]

I think at least ... it would be easier if you posted example A and B lists

I always like doing numpy stuff :P

we can make this more readable by breaking it apart

dup_ids_in_b = list(B[:,1]) # take column 1 from B (we use list so its not of type `object`)
boolmask_b_dups_in_a = np.in1d(dup_ids_in_b,A) # True,True,False,... True for all indices where B[i][1] is in A
person_ids = B[boolmask_b_dups_in_a][:,4] # take the fourth column of all the True indices from last step

Upvotes: 0

rlms
rlms

Reputation: 11060

Here is how you turn a single for loop into a list comprehension:

my_list = []
for i in something:
    my_list.append(i+7)

goes to

my_list = [i+7 for i in something]

Here is how you turn nested for loops into a list comprehension:

my_list = []
for i in first_thing:
    for j in second_thing:
        my_list.append(i + j)

goes to

my_list = [i + j for i in first_thing for j in second_thing]

So in your case you want to do this:

personIDList = [b[4] for a in A for b in B if a == b[1]]

Upvotes: 0

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250991

Convert the unique ID list to a set first:

s = set(A)

Then iterate over the other list using a list comprehension:

personIDList = [item[4] for item in B if item[1] in s]

This is going to be O(N) in complexity compared to your approach which is O(N**2).

Upvotes: 3

poke
poke

Reputation: 387825

First of all, you would iterate over the elements not the indexes, so you would get this:

personIDList = []
for a in A:
    for b in B:
        if a == b[1]:
            personIDList.append(b[4])

This can be then easily turned into a list comprehension:

personIDList = [b[4] for a in A for b in B if a == b[1]]

Upvotes: 1

Related Questions