Set cell value in Pandas DataFrame based on a value in another DataFrame with same index/column names

Question

I have a Pandas DataFrame that looks like the following, with a hourly time index, and column names corresponding to a particular location:

df1.head()

                        33643  33667  33668  33677  33678  33680  33681  33686
   1998-01-01 10:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 11:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 12:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 13:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 14:00:00  322.0  343.0  323.0  341.0  342.0  342.0  341.0  343.0

I have a second Pandas DataFrame with the same index and column names, with a different variable.

df2.head()

                        33643   33667   33668   33677   33678   33680   33681
   1998-01-01 10:00:00  289.59  286.07  286.52  284.77  285.17  284.17  284.44   
   1998-01-01 11:00:00  289.83  286.31  286.76  285.00  285.40  284.39  284.66   
   1998-01-01 12:00:00  290.08  286.63  287.08  285.33  285.73  284.73  284.99   
   1998-01-01 13:00:00  290.39  287.27  287.68  286.10  286.46  285.56  285.80   
   1998-01-01 14:00:00  291.10  289.14  289.39  288.42  288.64  288.09  288.23

I would like to find where in df1 values equal np.nan, and if so, insert np.nan for the corresponding cell in df2. I have tried looking at the docs, but I haven't found them of much use. Other Stackoverflow pages haven't been super helpful. Here is my desired output:

df3.head()

                        33643  33667  33668  33677  33678  33680  33681  33686
   1998-01-01 10:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 11:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 12:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 13:00:00    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN   
   1998-01-01 14:00:00  291.10  289.14  289.39  288.42  288.64  288.09  288.23 289.12

Things I have tried:

df3 = pd.DataFrame(np.where(df1[:] == np.nan, df2[:] == np.nan, df2[:]))

df3 = df2.where(df1.iloc[:,:] == np.nan, np.nan)

Any suggestions? Am I on the right path?

Rayhane Mama · Accepted Answer

Actually you're not far from the solution, in fact, you're thinking right. But if I had to recommend a method I would recommend this using pandas.notnull

df3 = df2[pd.notnull(df1)]

you just pick cells from df2 where df1 is not null and put them in the corresponding cells in df3, all the others will be set to NaN automatically.

If you still wanna use np.where you could do something like this with pandas.isnull:

df3 = pd.DataFrame(np.where(pd.isnull(df1),np.nan,df2))

but then you'll have to use the columns parameter to rename the columns. Also this method is a bit slower so I wouldn't really use it personally.

Hope this was helpful.

output ( I only took your first 2 cols in my sample just to go faster ):

                     33643   33667
1998-01-01 10:00:00    NaN     NaN
1998-01-01 11:00:00    NaN     NaN
1998-01-01 12:00:00    NaN     NaN
1998-01-01 13:00:00    NaN     NaN
1998-01-01 14:00:00  291.1  289.14

Set cell value in Pandas DataFrame based on a value in another DataFrame with same index/column names

Answers (1)

Related Questions