Reputation: 568
edit: A simple mistake/bug caused some non-explainable problems. I edited the question to leave the part that actually can be explained and for which answers have been posted.
I am struggling to understand the following indexing behavior:
Suppose I have some pd.DataFrame
:
In [18]: Df = pd.DataFrame(zip(list('abcde'),sp.randn(5)),index=range(5),columns=['label','val'])
In [19]: Df
Out[19]:
label val
0 a -0.705392
1 b 0.087682
2 c 1.519180
3 d 1.363852
4 e -0.004182
And I am trying to normalize all values of val
by one of them, say c
.
Intuitively I would write
Df['val'] / Df.loc[Df['label'] == 'c']['val']
But the result this triggers some broadcasting behavior I do not fully understand:
In [20]: Df['val'] / Df.loc[Df['label'] == 'c']['val']
Out[20]:
0 NaN
1 NaN
2 1.0
3 NaN
4 NaN
Name: val, dtype: float64
Why does this happen?
Upvotes: 2
Views: 217
Reputation: 101
[I think what happens here is that you are not allowed to divide Series by other Series directly - via the /
operator. There is a pandas.Series.divide
function for that.] EDIT: apparently you can and I am dumb.
If you convert Df.loc[Df['label'] == 'c']['val']
to a float you will have no problem dividing a Series object by it:
foo = float(Df.loc[Df['label'] == 'c']['val'])
Df['val'] / foo
Upvotes: 0
Reputation: 7038
You're dividing a Series
by a Series
, which is causing pandas to align on index. If you look at the value produced via indexing:
df.loc[df['label'] == 'c']['val']
2 1.51918
Name: val, dtype: float64
... you'll see this is a Series
. If you further index this Series
:
df.loc[df['label'] == 'c']['val'][2]
1.51918
... we're now left with:
type(df.loc[df['label'] == 'c']['val'][2])
<class 'numpy.float64'>
And if we attempt dividing the whole val
Series by this:
df.val / df.loc[df['label'] == 'c']['val'][2]
0 -0.464324
1 0.057717
2 1.000000
3 0.897755
4 -0.002753
Name: val, dtype: float64
... we have the expected behavior.
Please note, this kind of messy chained indexing is NOT how you should be dividing a whole Series
by a singular value...
Upvotes: 3