Reputation: 79
I am trying to do two joins in pandas. I have two columns of image file names in one df(ListDf
). One columns is for old images, and the other is for updated images. I have a master of list of all images from a database that includes the path to the images (ImageDf
). I want to do a left join of ListDf
and ImageDf
to move the paths to the updated images into ListDf
, and then repeat the left join, this time moving the paths of the old images into ListDf
. I am doing this by making two copies of ImageDf
.
OldImgList = ImageDf
NewImgList = ImageDf
Then I am trying to rename the columns for OldImgList
and NewImgList
so that when the join occurs they have the correct column names in ListDf
OldImgList.columns = ['OldDateMod', 'OldFileName', 'OldPath']
NewImgList.columns = ['NewDateMod', 'NewFileName', 'NewPath']
However when I run the first line of code, the column names for both OldImgList
and NewImgList
get set to the names in the first line. Then when I run the second line, OldImgList
takes the column names of NewImgList
. What is going on here?
Upvotes: 0
Views: 38
Reputation: 1322
What you've done is create two references to the same dataframe.
If you want to create two copies of a dataframe, you can use the df.copy()
method (link to docs):
OldImgList = ImageDf.copy(deep=True)
NewImgList = ImageDf.copy(deep=True)
I'm not following your sequence of joins, but there is likely a way to do it without making 2 copies of your dataframe (which will use more memory). For example, you can supply lsuffix
or rsuffix
arguments to df.join()
to add a string suffix to the left or right dataframe in a join so that matching columns are distinguished from each other (see docs for more).
Upvotes: 1