Reputation: 5
I am new to Python. I wanted to try some simple function operations on dataframe but I encountered the following problem. My code is:
>>> df.head(3)
PercChange
0 0.000000
1 -7.400653
2 2.176843
>>> def switch(array):
... for i in range(len(array)):
... if array[i]<0:
... array[i]=0
... return array
...
>>> a=df.PercChange
>>> a=switch(a)
>>> df['PosPercChange']=a
>>> df.head(3)
PercChange PosPercChange
0 0.000000 0.000000
1 0.000000 0.000000
2 2.176843 2.176843
Why did my 'PercChange' column change as well? I already created a new variable for the operations separately. How can I avoid not changing my 'PercChange' column? Thanks a lot.
[Solved]
So it is the problem of the data structure. In Python, '=' assignment doesn't copy value from one to another, but instead it name the same sequence with different name so changing one also changes the other. Thanks for the help.
Upvotes: 0
Views: 72
Reputation: 365925
When you assign a value to a variable in Python, it doesn't copy the value; the variable just becomes a new name for the same value.
So, a
and df.PercChange
are just different names for the exact same Series
. The same way a change to "Star Wars V" affects "The Empire Strikes Back" or a change to "Former President George W. Bush" affects "President Bush 42", a change to a
affects df.PercChange
.
And calling a function is just assignment again: the parameter inside the function becomes another name for the same value as the argument in the function call, so array
is the same object as a
and df.PercChange
.
If you want to make a
into a name for a copy of the same data as df.PercChange
, instead of a name for the same object, you have to ask for that copy explicitly.
With Pandas, this is usually just the copy
method:
a = df.PercChange.copy()
But Pandas (and the NumPy library that underlies it) allows for all kinds of complicated things, so there are other complicated ways to copy things.
More generally, Python has the copy
module, with copy
and deepcopy
functions that can make shallow or deep copies of almost anything, not just Pandas Series.
But you're also halfway to a different solution. Your switch
function does a return array
at the end, and your caller does a = switch(a)
.
If switch
returned a different object, a
would now be a name for that different object. But, because it instead just returns its parameter, after modifying it in-place, all that a = switch(a)
is doing is re-asserting a
as a name for the same value it's already a name for.
So, another way to fix things is to do the copying inside switch
:
def switch(array):
array = array.copy()
for i in range(len(array)):
if array[i]<0:
array[i]=0
return array
… or to build up a whole new array or Series and return that:
def switch(array):
return array.apply(lambda: 0 if x<0 else x)
Upvotes: 1