user3001937
user3001937

Reputation: 2113

How can I remove Nan from list Python/NumPy

I have a list that countain values, one of the values I got is 'nan'

countries= [nan, 'USA', 'UK', 'France']

I tried to remove it, but I everytime get an error

cleanedList = [x for x in countries if (math.isnan(x) == True)]
TypeError: a float is required

When I tried this one :

cleanedList = cities[np.logical_not(np.isnan(countries))]
cleanedList = cities[~np.isnan(countries)]

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Upvotes: 167

Views: 468215

Answers (16)

GenDemo
GenDemo

Reputation: 761

I had a similar problem to solve, and strangely none of the suggested above worked (python 3.7.9):

but this one did:

df['colA'] = df['colA'].apply(lambda x: [item for item in x if not pd.isna(item)])

Upvotes: 0

Sayed
Sayed

Reputation: 5

import numpy as np
countries=[x for x in countries if x is not np.nan]

Upvotes: -3

Yohan Obadia
Yohan Obadia

Reputation: 2672

The problem comes from the fact that np.isnan() does not handle string values correctly. For example, if you do:

np.isnan("A")
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

However the pandas version pd.isnull() works for numeric and string values:

import pandas as pd
pd.isnull("A")
> False

pd.isnull(3)
> False

pd.isnull(np.nan)
> True

pd.isnull(None)
> True

Upvotes: 24

Aaron England
Aaron England

Reputation: 1273

I like to remove missing values from a list like this:

import pandas as pd
list_no_nan = [x for x in list_with_nan if pd.notnull(x)]

Upvotes: 13

vlmercado
vlmercado

Reputation: 1888

Using your example where...

countries= [nan, 'USA', 'UK', 'France']

Since nan is not equal to nan (nan != nan) and countries[0] = nan, you should observe the following:

countries[0] == countries[0]
False

However,

countries[1] == countries[1]
True
countries[2] == countries[2]
True
countries[3] == countries[3]
True

Therefore, the following should work:

cleanedList = [x for x in countries if x == x]

Upvotes: 64

user764357
user764357

Reputation:

The question has changed, so too has the answer:

Strings can't be tested using math.isnan as this expects a float argument. In your countries list, you have floats and strings.

In your case the following should suffice:

cleanedList = [x for x in countries if str(x) != 'nan']

Old answer

In your countries list, the literal 'nan' is a string not the Python float nan which is equivalent to:

float('NaN')

In your case the following should suffice:

cleanedList = [x for x in countries if x != 'nan']

Upvotes: 217

user7864386
user7864386

Reputation:

If you have a list of items of different types and you want to filter out NaN, you can do the following:

import math
lst = [1.1, 2, 'string', float('nan'), {'di':'ct'}, {'set'}, (3, 4), ['li', 5]]
filtered_lst = [x for x in lst if not (isinstance(x, float) and math.isnan(x))]

Output:

[1.1, 2, 'string', {'di': 'ct'}, {'set'}, (3, 4), ['li', 5]]

Upvotes: 3

Angelo
Angelo

Reputation: 1765

In my opinion most of the solutions suggested do not take into account performance. Loop for and list comprehension are not valid solutions if your list has many values. The solution below is more efficient in terms of computational time and it doesn't assume your list has numbers or strings.

import numpy as np
import pandas as pd
list_var = [np.nan, 4, np.nan, 20,3, 'test']
df = pd.DataFrame({'list_values':list_var})
list_var2 = list(df['list_values'].dropna())
print("\n* list_var2 = {}".format(list_var2))

Upvotes: 2

exclude 0 from the range list

['ret'+str(x) for x in list(range(-120,241,5)) if (x!=0) ]

Upvotes: 0

Zisis F
Zisis F

Reputation: 362

A way to directly remove the nan value is:

import numpy as np    
countries.remove(np.nan)

Upvotes: 5

Sorin Dragan
Sorin Dragan

Reputation: 540

Another way to do it would include using filter like this:

countries = list(filter(lambda x: str(x) != 'nan', countries))

Upvotes: 4

Ajay Shah
Ajay Shah

Reputation: 414

import numpy as np

mylist = [3, 4, 5, np.nan]
l = [x for x in mylist if ~np.isnan(x)]

This should remove all NaN. Of course, I assume that it is not a string here but actual NaN (np.nan).

Upvotes: 17

Beyran11
Beyran11

Reputation: 61

if you check for the element type

type(countries[1])

the result will be <class float> so you can use the following code:

[i for i in countries if type(i) is not float]

Upvotes: 6

sparrow
sparrow

Reputation: 11460

I noticed that Pandas for example will return 'nan' for blank values. Since it's not a string you need to convert it to one in order to match it. For example:

ulist = df.column1.unique() #create a list from a column with Pandas which 
for loc in ulist:
    loc = str(loc)   #here 'nan' is converted to a string to compare with if
    if loc != 'nan':
        print(loc)

Upvotes: -1

zhangxaochen
zhangxaochen

Reputation: 33997

use numpy fancy indexing:

In [29]: countries=np.asarray(countries)

In [30]: countries[countries!='nan']
Out[30]: 
array(['USA', 'UK', 'France'], 
      dtype='|S6')

Upvotes: 7

Serial
Serial

Reputation: 8045

In your example 'nan' is a string so instead of using isnan() just check for the string

like this:

cleanedList = [x for x in countries if x != 'nan']

Upvotes: 2

Related Questions