Pythoner
Pythoner

Reputation: 11

Python pandas filter by word

I have csv file:

df=pd.read_csv(Path(os.getcwd()+r'\all_files.csv'), sep=',', on_bad_lines='skip', index_col=False, dtype='unicode')

column:

column=input("Column:")

word:

word=input("Word:")

I want to filter a csv file:

df2=df[(df[column].dropna().str.contains(word.lower()))]

But when I write to column:ЄДРПОУ(Гр.8)

I have a error:

Warning (from warnings module):
  File "C:\python\python\FilterExcelFiles.py", line 35
    df2=df[(df[column].dropna().str.contains(word.lower()))]
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
Traceback (most recent call last):
  File "C:\python\python\FilterExcelFiles.py", line 51, in <module>
    s()
  File "C:\python\python\FilterExcelFiles.py", line 35, in s
    df2=df[(df[column].dropna().str.contains(word.lower()))]
  File "C:\Users\Станислав\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3496, in __getitem__
    return self._getitem_bool_array(key)
  File "C:\Users\Станислав\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 3549, in _getitem_bool_array
    key = check_bool_indexer(self.index, key)
  File "C:\Users\Станислав\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexing.py", line 2383, in check_bool_indexer
    raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

And I wont to lower df[column]

Upvotes: 0

Views: 845

Answers (2)

TheCSGuy
TheCSGuy

Reputation: 69

I have searched around for an answer and I came across a similar post that might have the solution for your problem.

According to the mentioned post, the reason for this error is due to the encoding for Python, which is usually ascii; the encoding can be checked by:

import sys
sys.getdefaultencoding()

To solve your problem, you need to change it to UTF-8, using the following!

import sys
reload(sys)   # Note this line is essential for the change 
sys.setdefaultencoding('utf-8')

Would like to credit the original solution to @jochietoch

Upvotes: 0

mozway
mozway

Reputation: 260490

You're dropping the NaN in the indexer, making it likely shorter, which results in the error in boolean indexing.

Don't dropna, the NaN will be False anyway:

df2 = df[df[column].str.contains(word.lower())]

Alternatively, if you had a operation that would return NaNs, you could fill them with False:

df2 = df[df[column].str.contains(word.lower()).fillna(False)]

Upvotes: 2

Related Questions