khyox
khyox

Reputation: 1326

pandas DataFrame dropped column appearing again

Sorry if I am doing something stupid, but I am very puzzled by this issue: I pass a DataFrame to a function, and inside that function I add a column and drop it. Nothing strange until here, but after the function has finished the DataFrame of the global namescope is showing the added&dropped column. If I declare the DF as global, this is not happening...

This test code is showing the issue in the four cases resulting from the combination of Python 3.3.3/2.7.6 and pandas 0.13.0/0.12.0:

#!/usr/bin/python
import pandas as pd

# FUNCTION DFcorr
def DFcorr(df):
    # Calculate column of accumulated elements
    df['SUM']=df.sum(axis=1)
    print('DFcorr: DataFrame after add column:')
    print(df)
    # Drop column of accumulated elements
    df=df.drop('SUM',axis=1)
    print('DFcorr: DataFrame after drop column:')
    print(df)  

# FUNCTION globalDFcorr
def globalDFcorr():
    global C
    # Calculate column of accumulated elements
    C['SUM']=C.sum(axis=1)
    print('globalDFcorr: DataFrame after add column:')
    print(C)
    # Drop column of accumulated elements
    print('globalDFcorr: DataFrame after drop column:')
    C=C.drop('SUM',axis=1)
    print(C)  

######################### MAIN #############################
C = pd.DataFrame.from_items([('A', [1, 2]), ('B', [3 ,4])], orient='index', columns['one', 'two'])
print('\nMAIN: Initial DataFrame:')
print(C)
DFcorr(C)
print('MAIN: DataFrame after call to DFcorr')
print(C)

C = pd.DataFrame.from_items([('A', [1, 2]), ('B', [3 ,4])], orient='index', columns=['one', 'two'])
print('\nMAIN: Initial DataFrame:')
print(C)
globalDFcorr()
print('MAIN: DataFrame after call to globalDFcorr')
print(C)

And here you are the output:

MAIN: Initial DataFrame:
   one  two
A    1    2
B    3    4

[2 rows x 2 columns]
DFcorr: DataFrame after add column:
   one  two  SUM
A    1    2    3
B    3    4    7

[2 rows x 3 columns]
DFcorr: DataFrame after drop column:
   one  two
A    1    2
B    3    4

[2 rows x 2 columns]
MAIN: DataFrame after call to DFcorr
   one  two  SUM
A    1    2    3
B    3    4    7

[2 rows x 3 columns]

MAIN: Initial DataFrame:
   one  two
A    1    2
B    3    4

[2 rows x 2 columns]
globalDFcorr: DataFrame after add column:
   one  two  SUM
A    1    2    3
B    3    4    7

[2 rows x 3 columns]
globalDFcorr: DataFrame after drop column:
   one  two
A    1    2
B    3    4

[2 rows x 2 columns]
MAIN: DataFrame after call to globalDFcorr
   one  two
A    1    2
B    3    4

[2 rows x 2 columns]

What am I missing? Many thanks!

Upvotes: 3

Views: 4241

Answers (1)

unutbu
unutbu

Reputation: 880239

Note this line in DFCorr:

df=df.drop('SUM',axis=1)

The df.drop method returns a new DataFrame. It does not mutate the original df.

Inside DFcorr, df is just a local variable. Assignments to df do not affect the global variable C. Only mutations of df would affect C.

So, you could make DFcorr behave more like globalDFcorr by changing that line to:

df.drop('SUM',axis=1, inplace=True)

Upvotes: 4

Related Questions