Anonymous
Anonymous

Reputation: 152

How to index a dataframe using a condition on a column that is a column of numpy arrays?

I currently have a pandas dataframe that has a column of values that are numpy arrays. I am trying to get the rows of the dataframe where the value of the column is an empty numpy array but I can't index using the pandas method. Here is an example dataframe.

data = {'Name': ['A', 'B', 'C', 'D'], 'stats': [np.array([1,1,1]), np.array([]), np.array([2,2,2]), np.array([])]}
df = pd.DataFrame(data)

I am trying to just get the rows where 'stats' is None, but when I try df[df['stats'] is None] I just get a KeyError: False. How can I filter by rows that contain an empty list?

Additionally, how can I filter by row where the numpy array is something specific? i.e. get all rows of df where df['stats'] == np.array([1, 1, 1])

Thanks

Upvotes: 0

Views: 238

Answers (2)

Niv Dudovitch
Niv Dudovitch

Reputation: 1658

for this question: "Additionally, how can I filter by row where the numpy array is something specific? i.e. get all rows of df where df['stats'] == np.array([1, 1, 1])"

data = {'Name': ['A', 'B', 'C', 'D'], 'stats': [np.array([1,1,1]), np.array([]), np.array([2,2,2]), np.array([])]}
df = pd.DataFrame(data)
df = df[df['stats'].apply(lambda x: np.array_equal(x, np.array([1,1,1])))]

Upvotes: 1

jezrael
jezrael

Reputation: 863166

You can check length by Series.str.len, because it working with all Iterables:

print (df['stats'].str.len())
0    3
1    0
2    3
3    0
Name: stats, dtype: int64

And then filter, e.g. rows with len=0:

df = df[df['stats'].str.len().eq(0)]
#alternative
#df = df[df['stats'].apply(len).eq(0)]
print (df)
  Name stats
1    B    []
3    D    []

If need test specific array is possible use tuples:

df =df[ df['stats'].apply(tuple) == tuple(np.array([1, 1, 1]))]
print (df)
  Name      stats
0    A  [1, 1, 1]

Upvotes: 1

Related Questions