doyz
doyz

Reputation: 886

Remove rows from DataFrame that contain null values within numpy array

I am trying to remove rows from a DataFrame that contain null values within numpy array

DataFrame:

name    array   
A       [nan, nan, nan] 
B       [111.425818592, -743.060293425, -180.420675659] 

Expected output

name    array   
B       [111.425818592, -743.060293425, -180.420675659] 

My attempt:

df = df[df['array'].apply(lambda x: np.where(~np.isnan(x)))]

Error i am getting is:

TypeError: unhashable type: 'numpy.ndarray'

Upvotes: 2

Views: 1202

Answers (3)

DJK
DJK

Reputation: 9274

You really should consider dropping the use of numpy arrays within dataframe columns, every operation you do on the series is going to be a heartache. Instead just convert into a dataframe and then use pandas functionaities

dfnew = pd.DataFrame(np.concatenate([df.name.values.reshape(-1,1),   
                     np.array(df.array.tolist())],axis=1),
                     columns['name','array1','array2','array3'])

  name   array1  array2   array3
0    A      NaN     NaN      NaN
1    B  111.426 -743.06 -180.421

Now you can use dropna()

dfnew.dropna(axis=0)

  name   array1  array2   array3
1    B  111.426 -743.06 -180.421

You can than always extract a single array if need be by

dfnew.iloc[1,1:].values

array([111.425818592, -743.060293425, -180.420675659], dtype=object)

Upvotes: 0

jpp
jpp

Reputation: 164783

Here is one way:

import pandas as pd, numpy as np

df = pd.DataFrame([['A', np.array([np.nan, np.nan, np.nan])],
                   ['B', np.array([111.425818592, -743.060293425, -180.420675659])]],
                  columns=['name', 'array'])

df = df[~np.all(list(map(np.isnan, df['array'])), axis=1)]

#   name                                            array
# 1    B  [111.425818592, -743.060293425, -180.420675659]

Or, if you want to remove rows where any values of the array are NaN:

df = df[~np.any(list(map(np.isnan, df['array'])), axis=1)]

Upvotes: 0

BENY
BENY

Reputation: 323356

Data from jpp

df[~pd.DataFrame(df.array.tolist()).isnull().all(1)]
Out[391]: 
  name                                            array
1    B  [111.425818592, -743.060293425, -180.420675659]

Upvotes: 2

Related Questions