grg rsr
grg rsr

Reputation: 568

unexpected pandas broadcasting behavior

edit: A simple mistake/bug caused some non-explainable problems. I edited the question to leave the part that actually can be explained and for which answers have been posted.


I am struggling to understand the following indexing behavior: Suppose I have some pd.DataFrame:

In [18]: Df = pd.DataFrame(zip(list('abcde'),sp.randn(5)),index=range(5),columns=['label','val'])

In [19]: Df
Out[19]: 
  label       val
0     a -0.705392
1     b  0.087682
2     c  1.519180
3     d  1.363852
4     e -0.004182

And I am trying to normalize all values of val by one of them, say c. Intuitively I would write

Df['val'] / Df.loc[Df['label'] == 'c']['val']

But the result this triggers some broadcasting behavior I do not fully understand:

In [20]: Df['val'] / Df.loc[Df['label'] == 'c']['val']
Out[20]: 
0    NaN
1    NaN
2    1.0
3    NaN
4    NaN
Name: val, dtype: float64

Why does this happen?

Upvotes: 2

Views: 217

Answers (2)

galicae
galicae

Reputation: 101

[I think what happens here is that you are not allowed to divide Series by other Series directly - via the / operator. There is a pandas.Series.divide function for that.] EDIT: apparently you can and I am dumb.

If you convert Df.loc[Df['label'] == 'c']['val'] to a float you will have no problem dividing a Series object by it:

foo = float(Df.loc[Df['label'] == 'c']['val'])
Df['val'] / foo

Upvotes: 0

Andrew L
Andrew L

Reputation: 7038

You're dividing a Series by a Series, which is causing pandas to align on index. If you look at the value produced via indexing:

df.loc[df['label'] == 'c']['val']
2    1.51918
Name: val, dtype: float64

... you'll see this is a Series. If you further index this Series:

df.loc[df['label'] == 'c']['val'][2]
1.51918

... we're now left with:

type(df.loc[df['label'] == 'c']['val'][2])
<class 'numpy.float64'>

And if we attempt dividing the whole val Series by this:

df.val / df.loc[df['label'] == 'c']['val'][2]
0   -0.464324
1    0.057717
2    1.000000
3    0.897755
4   -0.002753
Name: val, dtype: float64

... we have the expected behavior.

Please note, this kind of messy chained indexing is NOT how you should be dividing a whole Series by a singular value...

Upvotes: 3

Related Questions