eng2019
eng2019

Reputation: 1035

Filter out dictionary with nan values

Let's say I have a dictionary:

dict_1 = {
    "key_1": [0., 5.6, 6.1, np.nan],
    "key_2": ["a", "t", "g", "r"],
    "key_3": [6.7, np.nan, 5.6, 4.1]
}

All keys have values that are list and they all have the same length. I want to filter out the elements that are np.nan (including the elements in the other keys that have the same index), so this is the desired output:

result = {
    "key_1": [0., 6.1],
    "key_2": ["a", "g"],
    "key_3": [6.7, 5.6]
}

Can someone please help? I don't want to do for loop because that's slow. I tried np.isnan(task_dictionary.values()).any(axis=1) but it failed because key_2 element type is string.

Thanks!

Upvotes: 0

Views: 264

Answers (3)

Jab
Jab

Reputation: 27515

You can use Pandas DataFrame.dropna:

pd.DataFrame(dict_1).dropna().to_dict('list')

Result:

{'key_1': [0.0, 6.1], 'key_2': ['a', 'g'], 'key_3': [6.7, 5.6]}

Full Code (breakdown):

import pandas as pd
import numpy as np

dict_1 = {'key_1': [0.0, 5.6, 6.1, np.nan], 'key_2': ['a', 't', 'g', 'r'], 'key_3': [6.7, np.nan, 5.6, 4.1]}

df = pd.DataFrame(dict_1)
# key_1 key_2  key_3
# 0    0.0     a    6.7
# 1    5.6     t    NaN
# 2    6.1     g    5.6
# 3    NaN     r    4.1

df.dropna(inplace=True)
# key_1 key_2  key_3
# 0    0.0     a    6.7
# 2    6.1     g    5.6

df.to_dict('list')
# {'key_1': [0.0, 6.1], 'key_2': ['a', 'g'], 'key_3': [6.7, 5.6]}

Without pandas, I believe your most efficient approach would be to use this mess using zip:

dict(zip(dict_1, zip(*(e for e in zip(*dict_1.values()) if np.nan not in e))))

Upvotes: 2

explodingfilms101
explodingfilms101

Reputation: 588

There's no way to do this without looping. You could make it a bit faster than regular for loops by using list comprehension though:


result = {i:dict_1[i] for i in dict_1 if np.nan not in dict_1[i]}

Upvotes: 0

BrokenBenchmark
BrokenBenchmark

Reputation: 19251

If you don't want to use any dependencies, you can first find all of the column indices to keep using this transposition trick and a set comprehension that checks whether a given column has any entries that aren't equal to themselves (since NaN != NaN). Then, use a dictionary comprehension to retain only those values:

indices_to_keep = {i for i, entry in enumerate(zip(*dict_1.values())) if all(x == x for x in entry)}
{k: [x for i, x in enumerate(v) if i in indices_to_keep] for k, v in dict_1.items()}

This outputs:

{'key_1': [0.0, 6.1], 'key_2': ['a', 'g'], 'key_3': [6.7, 5.6]}

Upvotes: 0

Related Questions