Reputation: 7448
I need to decide if the values of a certain column (df['some_col']
) in a data frame only contains a specific set of values (e.g. 'a', empty string and NaN
i.e. ["a","",NaN]
). I can think of using unique
to list all the unique values and check if there is any value that is not in the predefined set, but I am not sure if NaN
is considered as a value or not.
Upvotes: 1
Views: 985
Reputation: 210812
yes, you can use unique()
for that:
In [35]: w
Out[35]:
word
0 word03
1 NaN
2 word04
3
4 word02
5 word01
6 NaN
7 word01
8 word01
9 word01
In [36]: w.word.unique()
Out[36]: array(['word03', nan, 'word04', '', 'word02', 'word01'], dtype=object)
so using sets we can see the difference between allowed/expected strings and strings in your DF:
In [45]: allowed_words = set(['','word01', np.nan])
In [46]: set(w.word.unique()) - allowed_words
Out[46]: {'word02', 'word03', 'word04'}
Upvotes: 3