Pandas- Function is overwriting original DF even though I am maniuplating copy?

Question

I am creating a function to categorize data in bins in a df. I have made the function, and am first extracting numbers from a string, and replacing the column of text with a column of numbers.

The function is somehow overwriting the original dataframe, despite me only manipulating a copy of it.

def categorizeColumns(df):

    newdf = df
 
    if 'Runtime' in newdf.columns:
        for row in range(len(newdf['Runtime'])):
            strRuntime = newdf['Runtime'][row]
            numsRuntime = [int(i) for i in strRuntime.split() if i.isdigit()]
            newdf.loc[row,'Runtime'] = numsRuntime[0]
    
return newdf

df = pd.read_csv('moviesSeenRated.csv')
newdf = categorizeColumns(df)

The original df has a column of runtimes like this [34 mins, 32 mins, 44 mins] etc, and the newdf should have [33,32,44], which it does. However, the original df also changes outside the function.

Whats going on here? Any fixes? Thanks in advance.

EDIT: Seems like I wasn't making a copy, I needed to do

df.copy()

Thank you all!

izhang05 · Accepted Answer

The problem is that you aren't actually making a copy of the dataframe in the line newdf = df. To make a copy, you could do newdf = df.copy().

Pandas- Function is overwriting original DF even though I am maniuplating copy?

Answers (2)

Related Questions