bsky
bsky

Reputation: 20222

TypeError: unhashable type: 'slice' for pandas

I have a pandas datastructure, which I create like this:

test_inputs = pd.read_csv("../input/test.csv", delimiter=',')

Its shape

print(test_inputs.shape)

is this

(28000, 784)

I would like to print a subset of its rows, like this:

print(test_inputs[100:200, :])
print(test_inputs[100:200, :].shape)

However, I am getting:

TypeError: unhashable type: 'slice'

Any idea what could be wrong?

Upvotes: 14

Views: 42363

Answers (4)

Leonid Mednikov
Leonid Mednikov

Reputation: 973

Indexing in pandas is really confusing, as it looks like list indexing but it is not. You need to use .iloc, which is indexing by position

print(test_inputs.iloc[100:200, :])

And if you don't use column selection you can omit it

print(test_inputs.iloc[100:200])

P.S. Using .loc is not what you want, as it would look not for the row number, but for the row index (which can be filled we anything, not even numbers, not even unique). Ranges in .loc will find rows with index value 100 and 200, and return the lines between. If you just created the DataFrame .iloc and .loc may give the same result, but using .loc in this case is a very bad practice as it will lead you to difficult to understand problem when the index will change for some reason (for example you'll select some subset of rows, and from that moment the row number and index will not be the same).

P.P.S. You can use test_inputs[100:200], but not test_inputs[100:200, :] because pandas designers tried to combine different popular approaches into one construction. And test_input['column'] equals to test_input.loc[:, 'column'], but surprisingly slicing with integers test_input[100:200] equals to test_inputs.iloc[100:200] (while slicing with not integer values is equivalent to loc row slicing). And if you pass a pair of values to [] it considers as a tuple for multilevel column indexing so multi_level_columns_df['level_1', 'level_2'] is equivalent to multi_level_columns_df.loc[:, ('level_1', 'level_2')]. That is why your original construction led to the error: slice can't be used as a part of multilevel index.

Upvotes: 10

user_gautam
user_gautam

Reputation: 331

I was facing the same problem. Even the above solutions couldn't fix it. It was some problem with pandas, What I did was I changed the array into a numpy array that fixed the issue.

import pandas as pd
import numpy as np
test_inputs = pd.read_csv("../input/test.csv", delimiter=',')
test_inputs = np.asarray(test_inputs)

Upvotes: 0

vipin bansal
vipin bansal

Reputation: 896

print(test_inputs.values[100:200, :])
print(test_inputs.values[100:200, :].shape)

This code is also working for me.

Upvotes: 0

jezrael
jezrael

Reputation: 862611

There is more possible solutions, but output is not same:

loc selects by labels, but iloc and slicing without function, the start bounds is included, while the upper bound is excluded, docs - select by positions:

test_inputs = pd.DataFrame(np.random.randint(10, size=(28, 7)))

print(test_inputs.loc[10:20])
    0  1  2  3  4  5  6
10  3  2  0  6  6  0  0
11  5  0  2  4  1  5  2
12  5  3  5  4  1  3  5
13  9  5  6  6  5  0  1
14  7  0  7  4  2  2  5
15  2  4  3  3  7  2  3
16  8  9  6  0  5  3  4
17  1  1  0  7  2  7  7
18  1  2  2  3  5  8  7
19  5  1  1  0  1  8  9
20  3  6  7  3  9  7  1

print(test_inputs.iloc[10:20])
    0  1  2  3  4  5  6
10  3  2  0  6  6  0  0
11  5  0  2  4  1  5  2
12  5  3  5  4  1  3  5
13  9  5  6  6  5  0  1
14  7  0  7  4  2  2  5
15  2  4  3  3  7  2  3
16  8  9  6  0  5  3  4
17  1  1  0  7  2  7  7
18  1  2  2  3  5  8  7
19  5  1  1  0  1  8  9

print(test_inputs[10:20])
    0  1  2  3  4  5  6
10  3  2  0  6  6  0  0
11  5  0  2  4  1  5  2
12  5  3  5  4  1  3  5
13  9  5  6  6  5  0  1
14  7  0  7  4  2  2  5
15  2  4  3  3  7  2  3
16  8  9  6  0  5  3  4
17  1  1  0  7  2  7  7
18  1  2  2  3  5  8  7
19  5  1  1  0  1  8  9

Upvotes: 8

Related Questions