pc94
pc94

Reputation: 45

search string in pandas column

I am trying to find a substring in below hard_skills_name column, like i want all rows which has 'Apple Products' as hard skill.

enter image description here

I tried below code:

df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

but getting this error:

KeyError                                  Traceback (most recent call last)
<ipython-input-49-acdcdfbdfd3d> in <module>
----> 1 df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    877 
    878             maybe_callable = com.apply_if_callable(key, self.obj)
--> 879             return self._getitem_axis(maybe_callable, axis=axis)
    880 
    881     def _is_scalar_access(self, key: Tuple):

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1097                     raise ValueError("Cannot index with multidimensional key")
   1098 
-> 1099                 return self._getitem_iterable(key, axis=axis)
   1100 
   1101             # nested tuple slicing

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
   1035 
   1036         # A collection of keys
-> 1037         keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
   1038         return self.obj._reindex_with_indexers(
   1039             {axis: [keyarr, indexer]}, copy=True, allow_dups=True

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
   1252             keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
   1253 
-> 1254         self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
   1255         return keyarr, indexer
   1256 

~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
   1296             if missing == len(indexer):
   1297                 axis_name = self.obj._get_axis_name(axis)
-> 1298                 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   1299 
   1300             # We (temporarily) allow for some missing keys with .loc, except in

KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n              nan, nan, nan, nan, nan, nan, nan, nan, nan],\n             dtype='float64')] are in the [index]"

Upvotes: 2

Views: 837

Answers (2)

SeaBean
SeaBean

Reputation: 23217

Try to chain (temporarily) conversion of the list of strings to comma separated strings by str.join() before string search:

df[df['hard_skills_name'].str.join(', ').str.contains("Apple Products", case=False)]

The problem was owing to the string you are going to search is contained within a list. You cannot search the string in list directly with .str.contains(). To solve it, you can convert the list of strings to a long string first (e.g. with commas separating the substrings) by .str.join() before doing your string search.

Upvotes: 3

cegarza
cegarza

Reputation: 135

Your index has null values. You're going to have to make a boolean mask for this. Directly answering your question:

df.loc[(df.index.notnull()) & (df['hard_skills_name'].str.contains("Apple Products", case=False))] 

This should exclude anything that has null index values and does contain the given string in hard_skills_name

However, I suspect that this will also exclude some data that you're looking for. The solution in that case would be to change your index to not have any NaNs. Whether that means replacing it with a placeholder value or creating a brand new index, that's up to you.

Upvotes: 1

Related Questions