TJG
TJG

Reputation: 197

Pandas DataFrame - Test for change/modification

Simple question, and my google-fu is not strong enough to find the right term to get a solid answer from the documentation. Any term I look for that includes either change or modify leads me to questions like 'How to change column name....'

I am reading in a large dataframe, and I may be adding new columns to it. These columns are based on interpolation of values on a row by row basis, and the simple numbers of rows makes this process a couple hours in length. Hence, I save the dataframe, which also can take a bit of time - 30 seconds at least.

My current code will always save the dataframe, even if I have not added any new columns. Since I am still developing some plotting tools around it, I am wasting a lot of time waiting for the save to finish at the termination of the script needlessly.

Is there a DataFrame attribute I can test to see if the DataFrame has been modified? Essentially, if this is False I can avoid saving at the end of the script, but if it is True then a save is necessary. This simple one line if will save me a lot of time and a lost of SSD writes!

Upvotes: 1

Views: 184

Answers (1)

Vinícius Figueiredo
Vinícius Figueiredo

Reputation: 6518

You can use:

df.equals(old_df)

You can read the it's functionality in pandas' documentation. It basically does what you want, returning True only if both DataFrames are equal, and it's probably the fastest way to do it since it's an implementation of pandas itself.

Notice you need to use .copy() when assigning old_df before changes in your current df, otherwise you might pass the dataframe by reference and not by value.

Upvotes: 1

Related Questions