remove nan values from defaultdict(list) of dicts

I have the following code that I have created from running some analysis and I have put the results in a defaultdict(list). Afterwards I put the results into a csv file. First, Id like to remove the items that contain 'nan' values in Check2

How would I remove the values inside of the list of dicts?

from numpy import nan 
from collections import defaultdict

d = defaultdict(list,
                     {'Address_1': [{'Name': 'name',
               'Address_match': 'address_match_1',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 8,
                 'Check2' : 1},
              {'Name': 'name',
               'Address_match': 'address_match_2',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 20,
                 'Check2' : nan},
              {'Name': 'name',
               'Address_match': 'address_match_3',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 27,
                 'Check2' : nan}],
              'Address_2': [{'Name': 'name',
               'Address_match': 'address_match_1',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 30,
                 'Check2' : 1},
              {'Name': 'name',
               'Address_match': 'address_match_2',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 38,
                 'Check2' : nan},
              {'Name': 'name',
               'Address_match': 'address_match_3',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 12,
                 'Check2' : nan}]})

Afterwards my results should be:

d = defaultdict(list,
                     {'Address_1': [{'Name': 'name',
               'Address_match': 'address_match_1',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 8,
                 'Check2' : 1}],
              'Address_2': [{'Name': 'name',
               'Address_match': 'address_match_1',
               'ID': 'id',
               'Type': 'abc',
                'Check1' : 30,
                 'Check2' : 1}
            ]})

Upvotes: 0

Answers (3)

Shimon Cohen

Reputation: 524

You can do something like this:

import math
def remove_nan_att(d, att):
    return {key: [o for o in d[key] if not math.isnan(o[att])] for key in d}

d = remove_nan_att(d, 'Check2')

Go over the dict, and for each key, go over its list and filter nan values by the wanted attribute.

In case nan is from numpy:

from numpy import nan

def remove_nan_att(d, att):
    return {key: [o for o in d[key] if not o[att] is nan] for key in d}

d = remove_nan_att(d, 'Check2')

And if you don't want to use it as a function:

att = 'Check2'
d = {key: [o for o in d[key] if not o[att] is nan] for key in d}

Upvotes: 2

Corralien

Reputation: 120479

Try:

df = pd.DataFrame.from_records(d).unstack()
d = df[df.str['Check2'].notna()].unstack(level=0).to_dict('list')
print(d)

# Output:
{'Address_1': [{'Name': 'name',
   'Address_match': 'address_match_1',
   'ID': 'id',
   'Type': 'abc',
   'Check1': 8,
   'Check2': 1}],
 'Address_2': [{'Name': 'name',
   'Address_match': 'address_match_1',
   'ID': 'id',
   'Type': 'abc',
   'Check1': 30,
   'Check2': 1}]}

Update

You can simply use a double comprehension:

d = [{k: [v for v in l if pd.notna(v['Check2'])]} for k, l in d.items()]
print(d)

# Output:
[{'Address_1': [{'Name': 'name',
    'Address_match': 'address_match_1',
    'ID': 'id',
    'Type': 'abc',
    'Check1': 8,
    'Check2': 1}]},
 {'Address_2': [{'Name': 'name',
    'Address_match': 'address_match_1',
    'ID': 'id',
    'Type': 'abc',
    'Check1': 30,
    'Check2': 1}]}]

To be more understandable, here is the version with normal loops:

data = defaultdict(list)
for k, l in d.items():  # for each key in d (Address_1, Address_2, ...)
    for v in l: # for each record in key {'Name': ...}
        if pd.notna(v['Check2']):  # check the condition
            data[k].append(v)  # append to the dict

Upvotes: 1

user7864386

Reputation:

You can use dict comprehension + filter (it is filtering dictionaries where Check2 is not np.nan in each list in d):

out = {k: list(filter(lambda x: ~np.isnan(x['Check2']), lst)) for k, lst in d.items()}

You can do the same using dict comprehension + list comprehension:

out = {k: [dct for dct in lst if not np.isnan(dct['Check2'])] for k, lst in d.items()}

Output:

{'Address_1': [{'Name': 'name',
   'Address_match': 'address_match_1',
   'ID': 'id',
   'Type': 'abc',
   'Check1': 8,
   'Check2': 1}],
 'Address_2': [{'Name': 'name',
   'Address_match': 'address_match_1',
   'ID': 'id',
   'Type': 'abc',
   'Check1': 30,
   'Check2': 1}]}

Upvotes: 1

remove nan values from defaultdict(list) of dicts

Answers (3)

Related Questions