Reputation: 19
I have some code that calculates the column mean of all values greater than or equal to zero. For some reason I get a different output when I start from the back with .iloc[-1:] than if I just do the whole column or start somewhere else.
dtest = {'col1': [1, -2, -1, -1, -5], 'col2': [3, 4, -2, -1, -5]}
dftest = pd.DataFrame(data=dtest)
dftest
dftest[dftest['col2'] >= 0].iloc[-1:].mean().values[1]
When I run this code I get a mean of 4.0
But when I run this code with iloc[:]
dftest[dftest['col2'] >= 0].iloc[:].mean().values[1]
I get 3.5
and iloc[-2:] or [0:] or [-0:] also gives me 3.5.
Why does it differ?
Upvotes: 0
Views: 35
Reputation: 81
Like Yati said, the indexer [-1:] won't do the trick. To begin from the back, it would look something like this:
dftest[dftest['col2'] >= 0].iloc[::-1].mean().values[1]
#--or--
dftest['col2'][dftest['col2'] >= 0].iloc[::-1].mean()
Which does give an average of 3.5.
Upvotes: 1
Reputation: 448
dftest[dftest['col2'] >= 0].iloc[-1:].mean().values[1]
In the above code, when you use the filter [dftest['col2'] >= 0]
, the filtered dataframe contains the first 2 elements, i.e, the positive integers, 3 and 4. when youd do iloc[-1:]
, it selects the last element that is 4. when you take the mean of the single element, you get 4 as result
In the case where you use iloc[:]
instead of iloc[-1:]
, you select all the elements, that is, 3 and 4, giving their mean equal to 3.5
It is important to understand indexing with negative integers. They will pick the last n number of elements when using indexing like [-n:]
When you do iloc[-2:0]
, you select the last 2 elements (3 and 4). iloc[0:]
is same as iloc[-0:]
, since -0 is also 0, and it selects all the items from index 0 to end of series, i.e, 3 and 4 both
Upvotes: 2