Katrinachanlh
Katrinachanlh

Reputation: 5

Python why data are mutually updated when I perform operations?

I am new to Python. I wanted to try some simple function operations on dataframe but I encountered the following problem. My code is:

>>> df.head(3)
   PercChange
0    0.000000
1   -7.400653
2    2.176843
>>> def switch(array):
...     for i in range(len(array)):
...         if array[i]<0:
...             array[i]=0
...     return array
... 
>>> a=df.PercChange
>>> a=switch(a)
>>> df['PosPercChange']=a
>>> df.head(3)
   PercChange  PosPercChange
0    0.000000       0.000000
1    0.000000       0.000000
2    2.176843       2.176843

Why did my 'PercChange' column change as well? I already created a new variable for the operations separately. How can I avoid not changing my 'PercChange' column? Thanks a lot.

[Solved]

So it is the problem of the data structure. In Python, '=' assignment doesn't copy value from one to another, but instead it name the same sequence with different name so changing one also changes the other. Thanks for the help.

Upvotes: 0

Views: 72

Answers (1)

abarnert
abarnert

Reputation: 365925

When you assign a value to a variable in Python, it doesn't copy the value; the variable just becomes a new name for the same value.

So, a and df.PercChange are just different names for the exact same Series. The same way a change to "Star Wars V" affects "The Empire Strikes Back" or a change to "Former President George W. Bush" affects "President Bush 42", a change to a affects df.PercChange.

And calling a function is just assignment again: the parameter inside the function becomes another name for the same value as the argument in the function call, so array is the same object as a and df.PercChange.

If you want to make a into a name for a copy of the same data as df.PercChange, instead of a name for the same object, you have to ask for that copy explicitly.


With Pandas, this is usually just the copy method:

a = df.PercChange.copy()    

But Pandas (and the NumPy library that underlies it) allows for all kinds of complicated things, so there are other complicated ways to copy things.


More generally, Python has the copy module, with copy and deepcopy functions that can make shallow or deep copies of almost anything, not just Pandas Series.


But you're also halfway to a different solution. Your switch function does a return array at the end, and your caller does a = switch(a).

If switch returned a different object, a would now be a name for that different object. But, because it instead just returns its parameter, after modifying it in-place, all that a = switch(a) is doing is re-asserting a as a name for the same value it's already a name for.

So, another way to fix things is to do the copying inside switch:

def switch(array):
    array = array.copy()
    for i in range(len(array)):
        if array[i]<0:
            array[i]=0
    return array

… or to build up a whole new array or Series and return that:

def switch(array):
    return array.apply(lambda: 0 if x<0 else x)

Upvotes: 1

Related Questions