jhole89
jhole89

Reputation: 828

conditionally replace python dictionary value with comprehension

I am reading in a csv via csv.DictReader and trying to replace any empty values with the None value. DictReader seems to take the file as an instance of dictionaries where each row of the CSV is a dictionary (which I am fine with). However when I try to iterate through it row/dictionary by row/dictionary and replace any empty values ("") with None I seem to get unstuck. I had previously written this as a list comprehension like this:

    for row in data:
        row = [None if not x else x for x in row]

But I need to switch to using dictionaries rather than lists. I've not had any experience with dictionary comprehensions before but when I try to extend this for dictionaries I just cant get it to work. I was thinking something along the lines of:

    for row in data:
        row.values() = [None if not x else x for x in row.values()}

but I just get SyntaxError: invalid syntax.. I've tried a lot of other things (too many to list here) like:

    for row in data:
        row = {k:None for k,v in row if v not v else v}

but this seems to have the same problem.

For reference, my data looks like:

    {'colour': 'ab6612', 'line': '1', 'name': 'Baker', 'stripe': ''}
    {'colour': 'f7dc00', 'line': '3', 'name': '', 'stripe': 'FFFFFF'}

and would ideally end up as:

    {'colour': 'ab6612', 'line': '1', 'name': 'Baker', 'stripe': None}
    {'colour': 'f7dc00', 'line': '3', 'name': None, 'stripe': 'FFFFFF'}

Upvotes: 2

Views: 5498

Answers (3)

Ja8zyjits
Ja8zyjits

Reputation: 1502

If you are using CSV and the data is too large please use iteritems()

this will save prevent the large list generation caused by items() Try:

new_data=[]    
for row in data:
    new_data.append({k:(v if v else None) for k,v in row.iteritems()})

if you dont understand comprehensions follow this simple for loop:

for row in data:
    for k,v in row.iteritems():
        if not v:
            row[k]=None

the second method is easy to understand also does not create an additional list which is a better for higher performance

Upvotes: 0

Anand S Kumar
Anand S Kumar

Reputation: 91009

Your issue is that you are changing the name row to reference a new dictionary in the for loop, this will not change anything inside your original list/DictReader object - data .

If data is a list, you should enumerate over data and change the dictionary inside data (or make that reference a new dictionary)

Example -

for i,row in enumerate(data):
     data[i] = {k:(v if v else None) for k,v in row.items()}

Example test -

>>> data = [{1:2 , 3:''},{4:'',5:6}]
>>> for i,row in enumerate(data):
...     data[i] = {k:(v if v else None) for k,v in row.items()}
...
>>> data
[{1: 2, 3: None}, {4: None, 5: 6}]

And since you are using DictReader class, you cannot directly, change the DictReader object, so you should create a new list , and add the changed row in the new list (or a DictWriter object, would prefer the DictWriter object) -

Example -

>>> newdata = []
>>> for row in data:
...     newdata.append({k:(v if v else None) for k,v in row.items()})

Upvotes: 5

301_Moved_Permanently
301_Moved_Permanently

Reputation: 4196

Your main error is that you are trying to iterate twice over your dictionary whereas you only need to do it once.

Try:

data = {k:(v if v else None) for k,v in data.items()}

without the for-loop.

Upvotes: 0

Related Questions