Reputation: 159
I have a DataFrame to which I am adding several boolean columns. For each column, I initialize it to False and then set some values to True. If I do this for one and then for another, the first gets reinitialized to all False. For example,
In [170]: df['racedif']=False
In [171]: df['racedif'][~ df.newpers]=df.ptdtrace[~ df.newpers]!=df.ptdtrace.groupby(df.personid).apply(pd.Series.shift)[~ df.newpers]
In [172]: df.racedif.sum()
Out[172]: 28
In [173]: df.sexdif.sum()
Out[173]: 0
In [174]: df['sexdif']=False
In [175]: df['sexdif'][~ df.newpers]=df.pesex[~ df.newpers]!=df.pesex.groupby(df.personid).apply(pd.Series.shift)[~ df.newpers]
In [176]: df.sexdif.sum()
Out[176]: 31
In [177]: df.racedif.sum()
Out[177]: 0
But if I first initialize them both to False before setting values, this does not happen.
In [203]: df['sexdif']=False
...: df['racedif']=False
...: df['sexdif'][~ df.newpers]=df.pesex[~ df.newpers]!=df.pesex.groupby(df.personid).apply(pd.Series.shift)[~ df.newpers]
...: df['racedif'][~ df.newpers]=df.ptdtrace[~ df.newpers]!=df.ptdtrace.groupby(df.personid).apply(pd.Series.shift)[~ df.newpers]
...:
In [204]: df.sexdif.sum()
Out[204]: 31
In [205]: df.racedif.sum()
Out[205]: 28
Why is this happening and is this a bug?
Added a simpler example that does not have the same problem. Why?
In [255]: df.x=False
In [256]: df.x[df.is456]=df['truth'][df.is456]
In [257]: df.x
Out[257]:
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 False
9 False
Name: x, dtype: bool
In [258]: df.y=False
In [259]: df.y[df.is456]=df['truth'][df.is456]
In [260]: df.y
Out[260]:
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 False
9 False
Name: y, dtype: bool
In [261]: df.x
Out[261]:
0 False
1 False
2 False
3 False
4 True
5 True
6 True
7 False
8 False
9 False
Name: x, dtype: bool
Non-chained indexing
In [281]: df.loc[:,'sexdif']=False
In [282]: df.sexdif.sum()
Out[282]: 0
In [283]: df.loc[:,'sexdif'][~ df.newpers]=df.pesex[~ df.newpers]!=df.pesex.groupby(df.personid).apply(pd.Series.shift)[~ df.newpers]
In [284]: df.sexdif.sum()
Out[284]: 31
In [285]: df.loc[:,'racedif']=False
In [286]: df.sexdif.sum()
Out[286]: 0
Upvotes: 0
Views: 83
Reputation: 129018
you are chain indexing, see docs here: http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-view-versus-copy
bottom line is use
df.loc[row_indexer,col_indexer] = value
to assign and not
df[col_indexer][row_indexer] = value
Upvotes: 3