Chosing a different value for NaN entries from appending DataFrames with different columns

Question

I am concatenating multiple months of csv's where newer, more recent versions have additional columns. As a result, putting them all together fills certain rows of certain columns with NaN.

The issue with this behavior is that it mixes these NaNs with true null entries from the data set which need to be easily distinguishable.

My only solution as of now is to replace the original NaNs with a unique string, concatenate the csv's, replace the new NaNs with a second unique string, replace the first unique string with NaN.

Given the amount of data I am processing, this is a very inefficient solution. I thought there was some way to determine how Panda's DataFrame fill these entries but couldn't find anything on it.

Updated example:

A B  
1 NaN  
2 3

And append

A B C  
1 2 3

Gives

A B C  
1 NaN NaN  
2 3 NaN  
1 2 3

But I want

A B C  
1 NaN 'predated'  
2 3 'predated'  
1 2 3

Chosing a different value for NaN entries from appending DataFrames with different columns

Answers (1)

Related Questions