Reputation: 11
I have csv file:
df=pd.read_csv(Path(os.getcwd()+r'\all_files.csv'), sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')
column:
column=input("Column:")
word:
word=input("Word:")
I want to filter a csv file:
df2=df[(df[column].dropna().str.contains(word.lower()))]
But when I write to column:ЄДРПОУ(Гр.8)
I have a error:
Warning (from warnings module):
File "C:\python\python\FilterExcelFiles.py", line 35
df2=df[(df[column].dropna().str.contains(word.lower()))]
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Traceback (most recent call last):
File "C:\python\python\FilterExcelFiles.py", line 51, in <module>
s()
File "C:\python\python\FilterExcelFiles.py", line 35, in s
df2=df[(df[column].dropna().str.contains(word.lower()))]
File "C:\Users\Станислав\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3496, in __getitem__
return self._getitem_bool_array(key)
File "C:\Users\Станислав\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3549, in _getitem_bool_array
key = check_bool_indexer(self.index, key)
File "C:\Users\Станислав\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 2383, in check_bool_indexer
raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
And I wont to lower df[column]
Upvotes: 0
Views: 845
Reputation: 69
I have searched around for an answer and I came across a similar post that might have the solution for your problem.
According to the mentioned post, the reason for this error is due to the encoding for Python, which is usually ascii
; the encoding can be checked by:
import sys
sys.getdefaultencoding()
To solve your problem, you need to change it to UTF-8
, using the following!
import sys
reload(sys) # Note this line is essential for the change
sys.setdefaultencoding('utf-8')
Would like to credit the original solution to @jochietoch
Upvotes: 0
Reputation: 260490
You're dropping the NaN in the indexer, making it likely shorter, which results in the error in boolean indexing.
Don't dropna, the NaN will be False anyway:
df2 = df[df[column].str.contains(word.lower())]
Alternatively, if you had a operation that would return NaNs, you could fill them with False
:
df2 = df[df[column].str.contains(word.lower()).fillna(False)]
Upvotes: 2