Why is my df is searching column indexes rather than checking column names?

Question

My df matrix looks like this:

           rating
id       10153337 10183250 10220967   ...    99808270 99816554 99821259
user_id                               ...
10003869      NaN      8.0      NaN   ...         NaN      NaN      NaN
10022889      NaN      NaN      3.0   ...         NaN      1.0      NaN

I can't get a column that I need because it returns an 'indices out of bounds' error

specificID = ratings_matrix[[99816554]]
...
     raise IndexError("indices are out-of-bounds")
IndexError: indices are out-of-bounds

Why is it not searching the values given for columns?

Some runnable code:

ratings = pd.read_json(
''.join(
    ['{"columns":["id","rating","user_id"],"index":[0,1,2],"data":[[',
     '67728134,4,10003869],[57495823,9,10060085],[99816554,1,10022889]]}']
), orient='split')

ratings
ratings.dtypes

ratings_matrix = ratings.pivot_table(index=['user_id'], columns=['id'], values=['rating'])
ratings_matrix.columns.map(type)
ratings_matrix[[67728134]] #here! searches column numbers rather than values

piRSquared · Accepted Answer

Notice that when you created your pivot, you passed a list to the values parameter:

ratings_matrix = ratings.pivot_table( # |<--- here --->|
    index=['user_id'], columns=['id'], values=['rating'])

This told pandas to create a pd.MultiIndex. That's why you have to levels of columns with rating on top in your result.

option 1
use the multiindex

specificID = ratings_matrix[[('rating', 99816554)]]

option 2
don't create the multiindex

ratings_matrix = ratings.pivot_table( # see what I did?
    index=['user_id'], columns=['id'], values='rating')

Then

specificID = ratings_matrix[[99816554]]

setup

df = pd.read_json(
    ''.join(
        ['{"columns":["id","rating","user_id"],"index":[0,1,2],"data":[[',
         '67728134,4,10003869],[57495823,9,10060085],[99816554,1,10022889]]}']
    ), orient='split'
)

df

ratings_matrix = ratings.pivot_table( # |<--- here --->|
    index=['user_id'], columns=['id'], values=['rating'])
ratings_matrix[[('rating', 67728134)]]

ratings_matrix = ratings.pivot_table( # see what I did?
    index=['user_id'], columns=['id'], values='rating')
ratings_matrix[[67728134]]

Why is my df is searching column indexes rather than checking column names?

Answers (1)

Related Questions