Asif
Asif

Reputation: 65

Contructing new dataframe and keeping old one?

I am sorry for asking a naive question but it's driving me crazy at the moment. I have a dataframe df1, and creating new dataframe df2 by using it, as following:

import pandas as pd
def NewDF(df):
   df['sum']=df['a']+df['b']
   return df
df1 =pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
print(df1)
df2 =NewDF(df1)
print(df1)

which gives

   a  b
0  1  4
1  2  5
2  3  6

   a  b  sum
0  1  4    5
1  2  5    7
2  3  6    9

Why I am loosing df1 shape and getting third column? How can I avoid this?

Upvotes: 3

Views: 59

Answers (2)

ALollz
ALollz

Reputation: 59519

DataFrames are mutable so you should either explicitly pass a copy to your function, or have the first step in your function copy the input. Otherwise, just like with lists, any modifications your functions make also apply to the original.

Your options are:

def NewDF(df):
   df = df.copy()
   df['sum']=df['a']+df['b']
   return df

df2 = NewDF(df1)

or

df2 = NewDF(df1.copy())

Here we can see that everything in your original implementation refers to the same object

import pandas as pd
def NewDF(df):
    print(id(df))
    df['sum']=df['a']+df['b']
    print(id(df))
    return df

df1 =pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})

print(id(df1))
#2242099787480

df2 = NewDF(df1)
#2242099787480
#2242099787480

print(id(df2))
#2242099787480

Upvotes: 3

anand_v.singh
anand_v.singh

Reputation: 2838

The third column that you are getting is the Index column, each pandas DataFrame will always maintain an Index, however you can choose if you don't want it in your output.

import pandas as pd
def NewDF(df):
   df['sum']=df['a']+df['b']
   return df
df1 =pd.DataFrame({'a':[1,2,3],'b':[4,5,6]})
print(df1.to_string(index=False))
df2 =NewDF(df1)
print(df1.to_string(index = False))

Gives the output as

a  b
1  4
2  5
3  6
a  b  sum
1  4    5
2  5    7
3  6    9

Now you might have the question why does index exist, Index is actually a backed hash table which increases speed and is a highly desirable feature in multiple contexts, If this was just a one of question, this should be enough, but if you are looking to learn more about pandas and I would advice you to look into indexing, you can begin by looking here https://stackoverflow.com/a/27238758/10953776

Upvotes: 0

Related Questions