strange ix selection in pandas with duplicate indices

Question

There is something I dont understand with the ix selector in pandas.

Consider the following dataframe

dfnu=pd.DataFrame({'A':[7,1,2,3,4],'B':[7,8,9,1,1]},index=list('AABCD'))

now look at this output

dfnu['A']<2
Out[128]: 
A    False
A     True
B    False
C    False
D    False
Name: A, dtype: bool


dfnu['test']=dfnu.ix[dfnu['A']<2,'A']
dfnu
Out[127]: 
   A  B  test
A  7  7     1
A  1  8     1
B  2  9   NaN
C  3  1   NaN
D  4  1   NaN

what is going on here? why on earth test is equal to 1 on the first row?

BrenBarn · Accepted Answer

Since there is only one row with A<2, dfnu.ix[dfnu['A'<2, 'A'] has only one value:

>>> dfnu.ix[dfnu['A']<2, 'A']
A    1
Name: A, dtype: int64

When you assign this back into dfnu, the values are matched on the index. In other words, because the one row shown above has A as the index, its value (1) is assigned to every row in the original DataFrame that has A as the index. This is also why you get NaN for the other rows; since they don't have A as the index, no value is assigned for them.

strange ix selection in pandas with duplicate indices

Answers (2)

Related Questions