Prashant_J
Prashant_J

Reputation: 354

Missing data in a column of pandas dataframe

I am creating a dataframe name "salesdata" and it has a column name "Outlet_Size",this column contains some missing data.This is my code-:

#defining a dictionary
cat_dict ={}
#getting all the values of the column
outlet_size_values = salesdata.Outlet_Size.values
unique_outlet_size_val = list(set(outlet_size_values))  
print(unique_outlet_size_val)

the output I am getting is [nan,'High','Medium','Small'] I don't want this missing data(nan) to be the part of my list and I don;t want to create a new list for this.

Upvotes: 3

Views: 376

Answers (3)

piRSquared
piRSquared

Reputation: 294258

You can use numpy.unique

import pandas as pd
import numpy as np

np.unique(salesdata.Outlet_Size.dropna().values)

Upvotes: 0

Zeugma
Zeugma

Reputation: 32095

Use basic pandas functions: dropna to remove the nan values, then unique to get the set-equivalent result:

salesdata.Outlet_Size.dropna().unique()

Upvotes: 3

al0
al0

Reputation: 308

pandas has the function unique to get distinct values. You can use this and filter out NaNs like

salesdata.loc[~salesdata.Outlet_Size.isnull(), 'Outlet_Size'].unique()

Upvotes: 2

Related Questions