ZK Zhao
ZK Zhao

Reputation: 21513

Python & Pandas: How to return a copy of a dataframe?

Here is the problem. I use a function to return a randomized data,

data1 = [3,5,7,3,2,6,1,6,7,8]
data2 = [1,5,2,1,6,4,3,2,7,8]
df = pd.DataFrame(data1, columns = ['c1'])
df['c2'] = data2

def randomize_data(df):
    df['c1_ran'] = df['c1'].apply(lambda x: (x + np.random.uniform(0,1)))
    df['c1']=df['c1_ran']
    # df.drop(['c1_ran'], 1, inplace=True)
    return df

temp_df = randomize_data(df)

display(df)
display(temp_df)

However, the df (source data) and the temp_df (randomized_data) is the same. Here is the result:

enter image description here

How can I make the temp_df and df different from each other?


I find I can get rid of the problem by adding df.copy() at the beginning of the function

def randomize_data(df):
    df = df.copy()

But I'm not sure if this is the right way to deal with it?

Upvotes: 2

Views: 2222

Answers (2)

HYRY
HYRY

Reputation: 97261

Use DataFrame.assign():

def randomize_data(df):
    return df.assign(c1=df.c1 + np.random.uniform(0, 1, df.shape[0]))

Upvotes: 1

Mengyu Liu
Mengyu Liu

Reputation: 1

I think you are right, and DataFrame.copy() have an optional argument 'deep'. You can find details in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html

Upvotes: 0

Related Questions