mevatron
mevatron

Reputation: 14011

Remove duplicate keys from a dictionary list

I am new to Python, and I am currently stumped by this problem:

I have a list of dictionaries generated csv.DictReader. I have created the list with the function as follows:

def csvToDictList(filename):
    reader = csv.DictReader(open(filename, 'rb'))

    list = []
    for row in reader:
        list.append(row)

    return (list, reader.fieldnames)

This worked great, but CSV file I am processing has duplicate columns, so I end up with a dictionary like:

[
{'Column1': 'Value1', 'Column2': 'Value2', ... <some unique columns and values> ..., 'Column1': 'Value1', 'Column2': 'Value2'},
...
{'Column1': 'Value1N', 'Column2': 'Value2N', ... <some unique columns and values> ..., 'Column1': 'Value1N', 'Column2': 'Value2N'}
]

My main question is how do I remove duplicate columns out of this dictionary list?

I thought about iterating over each key, and then removing the column when I detect a duplicate key name with something like this:

def removeColumn(dictList, colName):
    for row in dictList:
        del row[colName]

But, won't that remove both columns? Should I be operating on the hash-keys of the dictionary? Any help is appreciated!

EDIT : The duplicates I was seeing were actually present in the reader.fieldnames list. So, I was assuming the dictionaries contained these columns as well, which was an incorrect assumption.

Upvotes: 0

Views: 2734

Answers (1)

eumiro
eumiro

Reputation: 212835

There is nothing like duplicate keys in a dictionary.

If you have more columns with the same name, DictReader will take only the last one (overwriting the previous ones).

For the following CSV file:

a,b,c,a,b
1,2,3,4,5
6,7,8,9,10

the DictReader will return following dicts:

{'a': '4', 'c': '3', 'b': '5'}
{'a': '9', 'c': '8', 'b': '10'}

thus throwing the previous values for a and b columns away.

Upvotes: 2

Related Questions