Alex_P
Alex_P

Reputation: 2952

DataFrame slicing in Python fails

I want to slice my data in Python. The very basic task to slice my dataframe throws unexpected errors at me.

My code is:

import pandas as pd

test_file = pd.read_csv("C:/Users/Lenovo/Desktop/testfile.csv")
test_select = test_file[["Category", "Shop"]]
print(test_select[1,1])

The code print(test_select[1,1]) should display the second row of the second column.

The error message:

File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: (1, 1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:/Users/Lenovo/.PyCharmCE2018.1/config/scratches/Dictionary.py", line 8, in print(h_select[1,1]) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\frame.py", line 2688, in getitem return self._getitem_column(key) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column return self._get_item_cache(key) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache values = self._data.get(item) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\internals.py", line 4115, in get loc = self.items.get_loc(item) File "C:\Users\Lenovo\PycharmProjects\mindnotez\venv\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc return self._engine.get_loc(self._maybe_cast_indexer(key)) File "pandas_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc File "pandas_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: (1, 1)

When I print print(test_select.head()), I get the following output:

     Category           Shop
0       Jidlo         Albert
1       Jidlo          BILLA
2       Jidlo         Albert
3       Jidlo         Albert
4  Restaurant  Kockafé Freyd

Slicing the dataframe like print(test_select[1:4]), prints row 1:3. With the command print(test_select[1,1]), I want the second column, second row. However, I receive the error message above.

Why do I receive the KeyError exception? What am I missing?

I use:

Upvotes: 1

Views: 3011

Answers (4)

ayorgo
ayorgo

Reputation: 3902

When you want to slice a dataframe

By row number

df.iloc[[1, 5]] # to get rows 1 and 5

df.iloc[1:6] # to get rows 1 to 5 inclusive

You can also narrow it down to a specific column as follows (to avoid chain indexing)

df.iloc[[1, 5], df.columns.get_loc('Shop')]

or multiple columns

df.iloc[[1, 5], df.columns.get_indexer(['Shop', 'Category'])]

By label based index

# Numeric
df.loc[[1, 5]] # 1 and 5 are considered labels here
df.loc[[1, 5], 'Shop']
df.loc[[1, 5], ['Shop', 'Category']]

# Textual or otherwise
df.set_index('Shop', inplace=True)
df.loc[['BILLA', 'Albert'], 'Category']

Upvotes: 4

ajay kizhakumkara
ajay kizhakumkara

Reputation: 1

If you want second row second column you have to use: df.iloc[1,1] iloc extracts data based on index

[1,1] takes the first row index and first column index. output would be 'BILLA'

Upvotes: 0

BENY
BENY

Reputation: 323226

Using loc this is using index and column rather than the position , here looks like your index is from 0 to n so that loc is equal to iloc when slice the row

df.loc[1,'Shop']
'BILLA'

Upvotes: 2

jpp
jpp

Reputation: 164663

The code print(test_select[1,1]) should display the second row of the second column.

No, it shouldn't. The syntax df[x] is usually reserved for retrieving a column (series), Boolean row indexing, or row slicing. These uses of pd.DataFrame.__getitem__, for which df[] is syntactic sugar, aren't conveniently documented. In general, they should be considered shortcuts, and if you are unsure you should prefer loc / iloc / at / iat, as appropriate.

To retrieve a scalar value via integer positional indexing, you can use pd.DataFrame.iat:

df.iat[1, 1]

Upvotes: 3

Related Questions