Mean of column varies based

Question

I have some code that calculates the column mean of all values greater than or equal to zero. For some reason I get a different output when I start from the back with .iloc[-1:] than if I just do the whole column or start somewhere else.

dtest = {'col1': [1, -2, -1, -1, -5], 'col2': [3, 4, -2, -1, -5]}
dftest = pd.DataFrame(data=dtest)
dftest

dftest[dftest['col2'] >= 0].iloc[-1:].mean().values[1]

When I run this code I get a mean of 4.0

But when I run this code with iloc[:]

dftest[dftest['col2'] >= 0].iloc[:].mean().values[1]

I get 3.5

and iloc[-2:] or [0:] or [-0:] also gives me 3.5.

Why does it differ?

Yati Raj · Accepted Answer

dftest[dftest['col2'] >= 0].iloc[-1:].mean().values[1]

In the above code, when you use the filter [dftest['col2'] >= 0], the filtered dataframe contains the first 2 elements, i.e, the positive integers, 3 and 4. when youd do iloc[-1:], it selects the last element that is 4. when you take the mean of the single element, you get 4 as result

In the case where you use iloc[:] instead of iloc[-1:], you select all the elements, that is, 3 and 4, giving their mean equal to 3.5

It is important to understand indexing with negative integers. They will pick the last n number of elements when using indexing like [-n:]

When you do iloc[-2:0], you select the last 2 elements (3 and 4). iloc[0:] is same as iloc[-0:], since -0 is also 0, and it selects all the items from index 0 to end of series, i.e, 3 and 4 both

Mean of column varies based

Answers (2)

Related Questions