RebeccaRol
RebeccaRol

Reputation: 123

Compare 1 column of 2D array and remove duplicates Python

Say I have a 2D array like:

array = [['abc',2,3,],
        ['abc',2,3],
        ['bb',5,5],
        ['bb',4,6],
        ['sa',3,5],
        ['tt',2,1]]

I want to remove any rows where the first column duplicates
ie compare array[0] and return only:

removeDups = [['sa',3,5],
        ['tt',2,1]]

I think it should be something like: (set first col as tmp variable, compare tmp with remaining and #set array as returned from compare)

for x in range(len(array)):
    tmpCol = array[x][0] 
    del array[x] 
    removed = compare(array, tmpCol) 
    array = copy.deepcopy(removed) 

print repr(len(removed))  #testing 

where compare is: (compare first col of each remaining array items with tmp, if match remove else return original array)

def compare(valid, tmpCol):
for x in range(len(valid)):
    if  valid[x][0] != tmpCol:
        del valid[x]
        return valid
    else:
        return valid

I keep getting 'index out of range' error. I've tried other ways of doing this, but I would really appreciate some help!

Upvotes: 0

Views: 1507

Answers (3)

Ben Schmidt
Ben Schmidt

Reputation: 401

Similar to other answers, but using a dictionary instead of importing counter:

counts = {}

for elem in array:
    # add 1 to counts for this string, creating new element at this key
    # with initial value of 0 if needed
    counts[elem[0]] = counts.get(elem[0], 0) + 1

new_array = []
for elem in array:
    # check that there's only 1 instance of this element.
    if counts[elem[0]] == 1:
        new_array.append(elem)

Upvotes: 1

MMF
MMF

Reputation: 5921

You can use a dictionary and count the occurrences of each key. You can also use Counter from the library collections that actually does this.

Do as follows :

from collection import Counter

removed = []
for k, val1, val2 in array:
    if Counter([k for k, _, _ in array])[k]==1:
        removed.append([k, val1, val2])

Upvotes: 0

akuiper
akuiper

Reputation: 215067

One option you can try is create a counter for the first column of your array before hand and then filter the list based on the count value, i.e, keep the element only if the first element appears only once:

from collections import Counter

count = Counter(a[0] for a in array)
[a for a in array if count[a[0]] == 1]
# [['sa', 3, 5], ['tt', 2, 1]]

Upvotes: 1

Related Questions