Reputation: 21513
Here is the problem. I use a function to return a randomized data,
data1 = [3,5,7,3,2,6,1,6,7,8]
data2 = [1,5,2,1,6,4,3,2,7,8]
df = pd.DataFrame(data1, columns = ['c1'])
df['c2'] = data2
def randomize_data(df):
df['c1_ran'] = df['c1'].apply(lambda x: (x + np.random.uniform(0,1)))
df['c1']=df['c1_ran']
# df.drop(['c1_ran'], 1, inplace=True)
return df
temp_df = randomize_data(df)
display(df)
display(temp_df)
However, the df
(source data) and the temp_df
(randomized_data) is the same. Here is the result:
How can I make the temp_df
and df
different from each other?
I find I can get rid of the problem by adding df.copy()
at the beginning of the function
def randomize_data(df):
df = df.copy()
But I'm not sure if this is the right way to deal with it?
Upvotes: 2
Views: 2222
Reputation: 97261
Use DataFrame.assign()
:
def randomize_data(df):
return df.assign(c1=df.c1 + np.random.uniform(0, 1, df.shape[0]))
Upvotes: 1
Reputation: 1
I think you are right, and DataFrame.copy() have an optional argument 'deep'. You can find details in http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.copy.html
Upvotes: 0