Renaming columns for one dataframe renames for a second df

Question

I am trying to do two joins in pandas. I have two columns of image file names in one df(ListDf). One columns is for old images, and the other is for updated images. I have a master of list of all images from a database that includes the path to the images (ImageDf). I want to do a left join of ListDf and ImageDf to move the paths to the updated images into ListDf, and then repeat the left join, this time moving the paths of the old images into ListDf. I am doing this by making two copies of ImageDf.

OldImgList = ImageDf
NewImgList = ImageDf

Then I am trying to rename the columns for OldImgList and NewImgList so that when the join occurs they have the correct column names in ListDf

OldImgList.columns = ['OldDateMod', 'OldFileName', 'OldPath']
NewImgList.columns = ['NewDateMod', 'NewFileName', 'NewPath']

However when I run the first line of code, the column names for both OldImgList and NewImgList get set to the names in the first line. Then when I run the second line, OldImgList takes the column names of NewImgList. What is going on here?

chris · Accepted Answer

What you've done is create two references to the same dataframe.

If you want to create two copies of a dataframe, you can use the df.copy() method (link to docs):

OldImgList = ImageDf.copy(deep=True)
NewImgList = ImageDf.copy(deep=True)

I'm not following your sequence of joins, but there is likely a way to do it without making 2 copies of your dataframe (which will use more memory). For example, you can supply lsuffix or rsuffix arguments to df.join() to add a string suffix to the left or right dataframe in a join so that matching columns are distinguished from each other (see docs for more).

Renaming columns for one dataframe renames for a second df

Answers (1)

Related Questions