Reputation: 1667
I have found an inconsistency (at least to me) in the following two approaches:
For a dataframe defined as:
df=pd.DataFrame([[1,2,3,4,np.NaN],[8,2,0,4,5]])
I would like to access the element in the 1st row, 4th column (counting from 0). I either do this:
df[4][1]
Out[94]: 5.0
Or this:
df.iloc[1,4]
Out[95]: 5.
Am I correctly understanding that in the first approach I need to use the column first and then the rows, and vice versa when using iloc? I just want to make sure that I use both approaches correctly going forward.
EDIT: Some of the answers below have pointed out that the first approach is not as reliable, and I see now that this is why:
df.index = ['7','88']
df[4][1]
Out[101]: 5.0
I still get the correct result. But using int instead, will raise an exception if that corresponding number is not there anymore:
df.index = [7,88]
df[4][1]
KeyError: 1
Also, changing the column names:
df.columns = ['4','5','6','1','5']
df['4'][1]
Out[108]: 8
Gives me a different result. So overall, I should stick to iloc or loc to avoid these issues.
Upvotes: 1
Views: 245
Reputation: 1356
Unfortunately, you are not using them correctly. It's just coincidence you get the same result.
df.loc[i, j]
means the element in df with the row named i
and the column named j
Besides many other defferences, df[j]
means the column named j
, and df[j][i]
menas the column named j
, and the element (which is row here) named i
.
df.iloc[i, j]
means the element in the i
-th row and the j
-th column started from 0.
So, df.loc
select data by label
(string or int or any other format, int in this case), df.iloc
select data by position
. It's just coincidence that in your example, the i
-th row named i
.
For more details you should read the doc
Update:
Think of df[4][1]
as a convenient way. There are some logic background that under most circumstances you'll get what you want.
In fact
df.index = ['7', '88']
df[4][1]
works because the dtype of index is str. And you give an int 1
, so it will fall back to position index. If you run:
df.index = [7, 88]
df[4][1]
Will raise an error. And
df.index = [1, 0]
df[4][1]
Sill won't be the element you expect. Because it's not the 1st row starts from 0. It will be the row with the name 1
Upvotes: 2
Reputation: 5109
You should think of DataFrames as a collection of columns. Therefore when you do df[4]
you get the 4th column of df
, which is of type Pandas Series. Afer this when you do df[4][1]
you get the 1st element of this Series, which corresponds to the 1st row and 4th column entry of the DataFrame, which is what df.iloc[1,4]
does exactly.
Therefore, no inconsistency at all, but beware: This will work only if you don't have any column names, or if your column names are [0,1,2,3,4]. Else, it will either fail or give you a wrong result. Hence, for positional indexing you must stick with iloc
, or loc
for name indexing.
Upvotes: 2