Reputation: 45
I am trying to find a substring in below hard_skills_name column, like i want all rows which has 'Apple Products' as hard skill.
I tried below code:
df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]
but getting this error:
KeyError Traceback (most recent call last)
<ipython-input-49-acdcdfbdfd3d> in <module>
----> 1 df.loc[df['hard_skills_name'].str.contains("Apple Products", case=False)]
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
877
878 maybe_callable = com.apply_if_callable(key, self.obj)
--> 879 return self._getitem_axis(maybe_callable, axis=axis)
880
881 def _is_scalar_access(self, key: Tuple):
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
1097 raise ValueError("Cannot index with multidimensional key")
1098
-> 1099 return self._getitem_iterable(key, axis=axis)
1100
1101 # nested tuple slicing
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_iterable(self, key, axis)
1035
1036 # A collection of keys
-> 1037 keyarr, indexer = self._get_listlike_indexer(key, axis, raise_missing=False)
1038 return self.obj._reindex_with_indexers(
1039 {axis: [keyarr, indexer]}, copy=True, allow_dups=True
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_listlike_indexer(self, key, axis, raise_missing)
1252 keyarr, indexer, new_indexer = ax._reindex_non_unique(keyarr)
1253
-> 1254 self._validate_read_indexer(keyarr, indexer, axis, raise_missing=raise_missing)
1255 return keyarr, indexer
1256
~/anaconda3/envs/python3/lib/python3.6/site-packages/pandas/core/indexing.py in _validate_read_indexer(self, key, indexer, axis, raise_missing)
1296 if missing == len(indexer):
1297 axis_name = self.obj._get_axis_name(axis)
-> 1298 raise KeyError(f"None of [{key}] are in the [{axis_name}]")
1299
1300 # We (temporarily) allow for some missing keys with .loc, except in
KeyError: "None of [Float64Index([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,\n nan, nan, nan, nan, nan, nan, nan, nan, nan],\n dtype='float64')] are in the [index]"
Upvotes: 2
Views: 837
Reputation: 23217
Try to chain (temporarily) conversion of the list of strings to comma separated strings by str.join()
before string search:
df[df['hard_skills_name'].str.join(', ').str.contains("Apple Products", case=False)]
The problem was owing to the string you are going to search is contained within a list. You cannot search the string in list directly with .str.contains()
. To solve it, you can convert the list of strings to a long string first (e.g. with commas separating the substrings) by .str.join()
before doing your string search.
Upvotes: 3
Reputation: 135
Your index has null values. You're going to have to make a boolean mask for this. Directly answering your question:
df.loc[(df.index.notnull()) & (df['hard_skills_name'].str.contains("Apple Products", case=False))]
This should exclude anything that has null index values and does contain the given string in hard_skills_name
However, I suspect that this will also exclude some data that you're looking for. The solution in that case would be to change your index to not have any NaNs. Whether that means replacing it with a placeholder value or creating a brand new index, that's up to you.
Upvotes: 1