Reputation: 1326
Sorry if I am doing something stupid, but I am very puzzled by this issue: I pass a DataFrame to a function, and inside that function I add a column and drop it. Nothing strange until here, but after the function has finished the DataFrame of the global namescope is showing the added&dropped column. If I declare the DF as global, this is not happening...
This test code is showing the issue in the four cases resulting from the combination of Python 3.3.3/2.7.6 and pandas 0.13.0/0.12.0:
#!/usr/bin/python
import pandas as pd
# FUNCTION DFcorr
def DFcorr(df):
# Calculate column of accumulated elements
df['SUM']=df.sum(axis=1)
print('DFcorr: DataFrame after add column:')
print(df)
# Drop column of accumulated elements
df=df.drop('SUM',axis=1)
print('DFcorr: DataFrame after drop column:')
print(df)
# FUNCTION globalDFcorr
def globalDFcorr():
global C
# Calculate column of accumulated elements
C['SUM']=C.sum(axis=1)
print('globalDFcorr: DataFrame after add column:')
print(C)
# Drop column of accumulated elements
print('globalDFcorr: DataFrame after drop column:')
C=C.drop('SUM',axis=1)
print(C)
######################### MAIN #############################
C = pd.DataFrame.from_items([('A', [1, 2]), ('B', [3 ,4])], orient='index', columns['one', 'two'])
print('\nMAIN: Initial DataFrame:')
print(C)
DFcorr(C)
print('MAIN: DataFrame after call to DFcorr')
print(C)
C = pd.DataFrame.from_items([('A', [1, 2]), ('B', [3 ,4])], orient='index', columns=['one', 'two'])
print('\nMAIN: Initial DataFrame:')
print(C)
globalDFcorr()
print('MAIN: DataFrame after call to globalDFcorr')
print(C)
And here you are the output:
MAIN: Initial DataFrame:
one two
A 1 2
B 3 4
[2 rows x 2 columns]
DFcorr: DataFrame after add column:
one two SUM
A 1 2 3
B 3 4 7
[2 rows x 3 columns]
DFcorr: DataFrame after drop column:
one two
A 1 2
B 3 4
[2 rows x 2 columns]
MAIN: DataFrame after call to DFcorr
one two SUM
A 1 2 3
B 3 4 7
[2 rows x 3 columns]
MAIN: Initial DataFrame:
one two
A 1 2
B 3 4
[2 rows x 2 columns]
globalDFcorr: DataFrame after add column:
one two SUM
A 1 2 3
B 3 4 7
[2 rows x 3 columns]
globalDFcorr: DataFrame after drop column:
one two
A 1 2
B 3 4
[2 rows x 2 columns]
MAIN: DataFrame after call to globalDFcorr
one two
A 1 2
B 3 4
[2 rows x 2 columns]
What am I missing? Many thanks!
Upvotes: 3
Views: 4241
Reputation: 880239
Note this line in DFCorr
:
df=df.drop('SUM',axis=1)
The df.drop
method returns a new DataFrame. It does not mutate the original df
.
Inside DFcorr
, df
is just a local variable. Assignments to df
do not affect the global variable C
. Only mutations of df
would affect C
.
So, you could make DFcorr
behave more like globalDFcorr
by changing that line to:
df.drop('SUM',axis=1, inplace=True)
Upvotes: 4