Reputation: 1035
Let's say I have a dictionary:
dict_1 = {
"key_1": [0., 5.6, 6.1, np.nan],
"key_2": ["a", "t", "g", "r"],
"key_3": [6.7, np.nan, 5.6, 4.1]
}
All keys have values that are list and they all have the same length. I want to filter out the elements that are np.nan
(including the elements in the other keys that have the same index), so this is the desired output:
result = {
"key_1": [0., 6.1],
"key_2": ["a", "g"],
"key_3": [6.7, 5.6]
}
Can someone please help? I don't want to do for loop
because that's slow. I tried np.isnan(task_dictionary.values()).any(axis=1)
but it failed because key_2
element type is string.
Thanks!
Upvotes: 0
Views: 264
Reputation: 27515
You can use Pandas DataFrame.dropna
:
pd.DataFrame(dict_1).dropna().to_dict('list')
Result:
{'key_1': [0.0, 6.1], 'key_2': ['a', 'g'], 'key_3': [6.7, 5.6]}
Full Code (breakdown):
import pandas as pd
import numpy as np
dict_1 = {'key_1': [0.0, 5.6, 6.1, np.nan], 'key_2': ['a', 't', 'g', 'r'], 'key_3': [6.7, np.nan, 5.6, 4.1]}
df = pd.DataFrame(dict_1)
# key_1 key_2 key_3
# 0 0.0 a 6.7
# 1 5.6 t NaN
# 2 6.1 g 5.6
# 3 NaN r 4.1
df.dropna(inplace=True)
# key_1 key_2 key_3
# 0 0.0 a 6.7
# 2 6.1 g 5.6
df.to_dict('list')
# {'key_1': [0.0, 6.1], 'key_2': ['a', 'g'], 'key_3': [6.7, 5.6]}
Without pandas, I believe your most efficient approach would be to use this mess using zip
:
dict(zip(dict_1, zip(*(e for e in zip(*dict_1.values()) if np.nan not in e))))
Upvotes: 2
Reputation: 588
There's no way to do this without looping. You could make it a bit faster than regular for loops by using list comprehension though:
result = {i:dict_1[i] for i in dict_1 if np.nan not in dict_1[i]}
Upvotes: 0
Reputation: 19251
If you don't want to use any dependencies, you can first find all of the column indices to keep using this transposition trick and a set comprehension that checks whether a given column has any entries that aren't equal to themselves (since NaN != NaN
). Then, use a dictionary comprehension to retain only those values:
indices_to_keep = {i for i, entry in enumerate(zip(*dict_1.values())) if all(x == x for x in entry)}
{k: [x for i, x in enumerate(v) if i in indices_to_keep] for k, v in dict_1.items()}
This outputs:
{'key_1': [0.0, 6.1], 'key_2': ['a', 'g'], 'key_3': [6.7, 5.6]}
Upvotes: 0