Joop
Joop

Reputation: 8108

Pandas Dataframe using .loc for assignment gives unexpected results

I am doing some calculations in pandas and the .loc method is having unexpected results. not sure if it is me misusing the syntax or a bug.

df = pd.DataFrame(index=['series1', 'series2', 'series3'])
df['prev value/unit'] = [99,99,99]
df['value'] = [100,100,100]
df['units'] = [100,100,0]
df['value/unit'] = df['value']/df['units']

creates a dataframe where there will be some div by zero values as shown below. Business logic dictates that if there is a /0 the prior value/unit should be used.

         prev value/unit  value  units  value/unit
series1               99    100    100    1.000000
series2               99    100    100    1.000000
series3               99    100      0         inf

so adding:

df.loc[df.units == 0, 'value/unit'] = df['prev value/unit']

has the desired effect and the inf above gets correctly overwritten by 99 (the previous per unit value).

However if there are no div/0.

df.loc[df.units == 0, 'value/unit']
#is a empty Series
#Series([], name: value/unit, dtype: float64)

and asigning df['prev value/unit'] to it overwrites all the values!!!!

so e.g.

df = pd.DataFrame(index=['series1', 'series2', 'series3'])
df['prev value/unit'] = [99,99,99]
df['value'] = [100,100,100]
df['units'] = [100,100,100]
df['value/unit'] = df['value']/df['units']
df.loc[df.units == 0, 'value/unit'] = df['prev value/unit']

gives:

         prev value/unit  value  units  value/unit
series1               99    100    100          99
series2               99    100    100          99
series3               99    100    100          99

which is totally unexpected. Did I accidentally misuse the .loc syntax or is this a bug? I am specifically using the it to avoid assigning to temporary views of the dataframe. for reference I am using pandas 0.13.1

Upvotes: 4

Views: 10331

Answers (1)

chrisb
chrisb

Reputation: 52236

I'm assuming it has something to do with views/copies, but it certainly seems like unexpected behavior - you might open an issue on github.

https://github.com/pydata/pandas/issues

An alternative way to write the code would be using numpy.where, e.g.

In [86]: import numpy as np
In [87]: df['value/unit'] = np.where(df['units'] == 0, df['prev value/unit'], df['value']/df['units'])

In [88]: df
Out[87]: 
         prev value/unit  value  units  value/unit
series1               99    100    100           1
series2               99    100    100           1
series3               99    100    100           1

Upvotes: 5

Related Questions