Reputation: 191
I have a Pandas DataFrame that looks like the following, with a hourly time index, and column names corresponding to a particular location:
df1.head()
33643 33667 33668 33677 33678 33680 33681 33686
1998-01-01 10:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 11:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 12:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 13:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 14:00:00 322.0 343.0 323.0 341.0 342.0 342.0 341.0 343.0
I have a second Pandas DataFrame with the same index and column names, with a different variable.
df2.head()
33643 33667 33668 33677 33678 33680 33681
1998-01-01 10:00:00 289.59 286.07 286.52 284.77 285.17 284.17 284.44
1998-01-01 11:00:00 289.83 286.31 286.76 285.00 285.40 284.39 284.66
1998-01-01 12:00:00 290.08 286.63 287.08 285.33 285.73 284.73 284.99
1998-01-01 13:00:00 290.39 287.27 287.68 286.10 286.46 285.56 285.80
1998-01-01 14:00:00 291.10 289.14 289.39 288.42 288.64 288.09 288.23
I would like to find where in df1 values equal np.nan, and if so, insert np.nan for the corresponding cell in df2. I have tried looking at the docs, but I haven't found them of much use. Other Stackoverflow pages haven't been super helpful. Here is my desired output:
df3.head()
33643 33667 33668 33677 33678 33680 33681 33686
1998-01-01 10:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 11:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 12:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 13:00:00 NaN NaN NaN NaN NaN NaN NaN NaN
1998-01-01 14:00:00 291.10 289.14 289.39 288.42 288.64 288.09 288.23 289.12
Things I have tried:
df3 = pd.DataFrame(np.where(df1[:] == np.nan, df2[:] == np.nan, df2[:]))
df3 = df2.where(df1.iloc[:,:] == np.nan, np.nan)
Any suggestions? Am I on the right path?
Upvotes: 2
Views: 771
Reputation: 2424
Actually you're not far from the solution, in fact, you're thinking right. But if I had to recommend a method I would recommend this using pandas.notnull
df3 = df2[pd.notnull(df1)]
you just pick cells from df2
where df1
is not null
and put them in the corresponding cells in df3
, all the others will be set to NaN
automatically.
If you still wanna use np.where you could do something like this with pandas.isnull:
df3 = pd.DataFrame(np.where(pd.isnull(df1),np.nan,df2))
but then you'll have to use the columns
parameter to rename the columns. Also this method is a bit slower so I wouldn't really use it personally.
Hope this was helpful.
output ( I only took your first 2 cols in my sample just to go faster ):
33643 33667
1998-01-01 10:00:00 NaN NaN
1998-01-01 11:00:00 NaN NaN
1998-01-01 12:00:00 NaN NaN
1998-01-01 13:00:00 NaN NaN
1998-01-01 14:00:00 291.1 289.14
Upvotes: 2