goldisfine
goldisfine

Reputation: 4850

Removing duplicate entries?

I need to compare values from different rows. Each row is a dictionary, and I need to compare the values in adjacent rows for the key 'flag'. How would I do this? Simply saying:

for row in range(1,len(myjson))::
      if row['flag'] == (row-1)['flag']:
             print yes

returns a TypeError: 'int' object is not subscriptable

Even though range returns a list of ints...


RESPONSE TO COMMENTS:

List of rows is a list of dictionaries. Originally, I import a tab-delimited file and read it in using the csv.dict module such that it is a list of dictionaries with the keys corresponding to the variable names.

Code: (where myjson is a list of dictionaries)

for row in myjson:
    print row

Output:

{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}
{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': '[email protected]'}
{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': '[email protected]'}
{'website': '', 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': '[email protected]'}
{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}
{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}

Also:

type(myjson)

<type 'list'>

Upvotes: 0

Views: 134

Answers (6)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250891

For comparing adjacent items you can use zip:

Example:

>>> lis = [1,1,2,3,4,4,5,6,7,7]
for x,y in zip(lis, lis[1:]):
     if x == y :
        print x,y,'are equal'
...         
1 1 are equal
4 4 are equal
7 7 are equal

For your list of dictionaries, you can do something like :

from itertools import izip
it1 = iter(list_of_dicts)
it2 = iter(list_of_dicts)
next(it2)
for x,y in izip(it1, it2):
      if x['flag'] == y['flag']
             print yes

Update:

For more than 2 adjacent items you can use itertools.groupby:

>>> lis =  [1,1,1,1,1,2,2,3,4]
for k,group in groupby(lis):
     print list(group)

[1, 1, 1, 1, 1]
[2, 2]
[3]
[4]

For your code it would be :

>>> for k, group in groupby(dic, key = lambda x : x['flag']):
...     print list(group)
...     
[{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}]
[{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': '[email protected]'}]
[{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}]
[{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': '[email protected]'}, {'website': '', 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ''}, {'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': '[email protected]'}]
[{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}]
[{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}]

Upvotes: 2

Pawel Miech
Pawel Miech

Reputation: 7822

It's simple. If you need to remove those dicts that have the same value for key "flag", as the title of your post suggests (it is somewhat misleading because your dictionaries are not strictly speaking duplicates), you can simply loop over the whole list of dictionaries, keeping track of flags in a separate list, if an item has a flag which is already in the list of flags simply don't add it, it would look something like:

def filterDicts(listOfDicts):
    result = []
    flags = []
    for di in listOfDicts:
        if di["flag"] not in flags:
            result.append(di)
            flags.append(di["flag"])
    return result

When called with value of list of dictionaries that you have provided, it returns list with 5 items, each has an unique value of flag.

Upvotes: 0

Sheng
Sheng

Reputation: 3555

You could try this

pre_item = list_of_rows[0]['flag']
for row in list_of_rows[1:]:
      if row['flag'] == pre_item :
             print yes
      pre_item = row['flag']

Upvotes: 1

Alfe
Alfe

Reputation: 59416

list_of_rows = [ { 'a': 'foo',
                   'flag': 'bar' },
                 { 'a': 'blo',
                   'flag': 'bar' } ]
for row, successor_row in zip(list_of_rows, list_of_rows[1:]):
    if row['flag'] == successor_row['flag']:
        print "yes"

Upvotes: 0

mkvcvc
mkvcvc

Reputation: 1565

Looks like you want to access list elements in batches:
http://code.activestate.com/recipes/303279/

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121446

Your exception indicates that list_of_rows is not what you think it is.

To look at other, adjacent rows, provided list_of_rows is indeed a list, I'd use enumerate() to include the current index and then use that index to load next and previous rows:

for i, row in enumerate(list_of_rows):
    previous = list_of_rows[i - 1] if i else None
    next = list_of_rows[i + 1] if i + 1 < len(list_of_rows) else None

Upvotes: 1

Related Questions