Python compare list to list of lists to check for duplicates

Question

I'm trying to update a CSV file by loading it in and then checking it against an existing list of lists. The data is read in like this by using csv.reader:

[["Name","URL","Team"],["Name","URL","Team"]]

I have another list of lists in the same format:

[["Name","URL","Team"],["Name","URL","Team"]]

My goal is to only add what doesn't already exist in the CSV file (first list of lists). I've tried the following, but it doesn't work:

reader = csv.reader(file)
fantasyList = [["Name","URL","Team"],["Name","URL","Team"]] 

new_list = []

for elem in reader:
    if elem not in new_list:
        new_list.append(elem)

for elem in fantasyList:
    if elem not in new_list:
        new_list.append(elem)

So how can I add a list from a list of lists to a separate list of lists without duplicating entries?

**

CLARIFICATION BELOW

**

Here is my problem:

I have a list of lists. It's a list which contains lists. Here's an example:

[["Bob the Builder", "Google", "Free Breakfast"],
["I down vote because I'm better than you", "Actually dude, your question kinda blows. Just clean it up and we'll upvote it.", "Is this better? because I Dunno LOL ¯\(°_o)/¯"]]

If that isn't clear enough, check out data structures for a better explanation of what is a "list" in python. So now that you understand what a list is, imagine putting a list inside of another list, and then repeating the process by adding more lists to the master list. You'd end up with a "list of lists."

Now lets say I have a CSV file that I want to load into my python program. The easy way to do so is to use the csv module. Here is the CSV module documentation. Here's an example of how to load a CSV file:

import csv

f = open(r'foo/bar.csv', 'a+')

reader = csv.reader(f)

So the above code demonstrates how to load a CSV as a reader object. Here is an example of how to iterate over the rows of the CSV:

import csv

f = open(r'foo/bar.csv', 'r')

reader = csv.reader(f)

for row in reader:
     print row

The above demonstrates how to print all of the rows of your CSV. So now imagine a CSV that has three columns. When you load the CSV using the csv module, you get a reader object. Lets say you want to update this CSV by adding rows that don't already exist. We don't want to create duplicate entries in our CSV file. So lets go ahead and take a look at what should, in theory, work:

import csv

f = open(r'foo/bar.csv', 'r')
#initialize a temporary list to transfer the CSV contents to 
tempList = []

#initialize the final list to be saved to the CSV
finalList =[]

reader = csv.reader(f)

for row in reader:
     tempList.append(row)

#close the file to avoid read/write errors
f.close()

Okay, good. Now our tempList should be a "list of lists" (see above examples) which represents the three columns of each row of the CSV. So now lets take a look at our other "list of lists" which we want to join with our tempList:

fantasyList = [["Pushing Will Protect You","Shoving Will Protect you", "Do you have stairs in your house?"],["Pak","Chooi","Unf"]]

As you can see, the above demonstrates a "list of lists." Now consider the following code:

#For each list in tempList, check if it exists in the final list

for elem in tempList:
    if elem not in finalList:
        finalList.append(elem)

for elem in fantasyList:
    if elem not in finalList:
        finalList.append(elem)

The above code should not create any duplicate entries in our finalList. For example, lets say tempList contains the list ["Pak","Chooi","Unf"]. Now lets say fantasyList also contains ["Pak","Chooi","Unf"]. After the above is run, finalList should not contain two entries of ["Pak","Chooi","Unf"] . So then the final step to update our CSV is to write our finalList to file:

f = open(r'foo/bar.csv', 'w')

wr = csv.writer(f, dialect='excel')

wr.writerows(finalList)

f.close()

But herein lies my problem: I find duplicate entries in my CSV. For whatever reason, I cannot figure out what isn't working here.

Burhan Khalid · Accepted Answer

I don't know what the actual problem is with your code, because its not clear from the snippets you provided.

However, here is an alternate solution to your problem. Create a set() which will enforce uniqueness of items. This way it doesn't matter what your adding, only unique things will get added to the master set.

Here is a small example:

>>> first = [['a','b','c'],['d','e','f']]
>>> second = [['a','b','c'],['d','e','f'],[1,2,3],['g','h','i']]
>>> master = set(tuple(i) for i in first)
>>> master.update((tuple(i) for i in second))
>>> list(master)
[('g', 'h', 'i'), ('a', 'b', 'c'), ('d', 'e', 'f'), (1, 2, 3)]

As you may have noticed, sets are not ordered; if this becomes an issue you can always order the set later by converting it to a list and then run sorted on it.

The second thing you may have noticed is that you end up with a list of tuples (sets can only contain hashable types). This won't have much impact if all you are doing is writing the combined file back out.

The other way to do this is with list comprehensions:

>>> master = [i for i in second if i not in first] + first
>>> master
[[1, 2, 3], ['g', 'h', 'i'], ['a', 'b', 'c'], ['d', 'e', 'f']]

Python compare list to list of lists to check for duplicates

CLARIFICATION BELOW

Answers (1)

Related Questions