Reputation: 395
I have the following code and it give me very strange error, my goal is to back fill the missing value for the data with different label. The error happens at this line df_filled[is_filled]
if I change the df_filled=df.asfreq(freq='D').fillna(method='bfill', limit=1).dropna(how='all').drop_duplicates(keep='last')
everything works fine but with this using freq=2D, df_filled[is_filled] doesn't have Boolean form.
from datetime import datetime, timedelta
import pandas as pd
import numpy as np
import random
##Generate the Data
np.random.seed(11)
date_today = datetime.now()
ndays = 15
df = pd.DataFrame({'date': [date_today + timedelta(days=(abs(np.random.randn(1))*2)[0]*x) for x in range(ndays)],
'test': pd.Series(np.random.randn(ndays)), 'test2':pd.Series(np.random.randn(ndays))})
df1=pd.DataFrame({'date': [date_today + timedelta(hours=x) for x in range(ndays)],
'test': pd.Series(np.random.randn(ndays)), 'test2':pd.Series(np.random.randn(ndays))})
df2=pd.DataFrame({'date': [date_today + timedelta(days=x)-timedelta(seconds=100*x) for x in range(ndays)],
'test': pd.Series(np.random.randn(ndays)), 'test2':pd.Series(np.random.randn(ndays))})
df=df.append(df1)
df=df.append(df2)
df = df.set_index('date').sort_index()
df = df.mask(np.random.random(df.shape) < .7)
df=df.reset_index()
df['test']=df['test'].astype(str)
df['test2']=df['test2'].astype(str)
df.replace('nan', np.nan, inplace = True)
##
df.set_index(df['date'].dt.date, inplace = True)
df = df[~df.index.duplicated(keep='first')]
df_filled=df.asfreq(freq='2D').fillna(method='bfill', limit=2).dropna(how='all').drop_duplicates(keep='last')
df_filled.set_index(df_filled['date'],inplace=True)
df_filled=df_filled.drop('date',1)
df.set_index(df['date'],inplace=True)
df=df.drop('date',1)
is_filled = (df.isnull() & df_filled.notnull()) | df.notnull()
df_filled[is_filled] ## error happens here
df_filled[is_filled]=df_filled[is_filled].applymap(lambda x: '_2D' if pd.notnull(x) else np.nan)
output:
ValueError: Must pass DataFrame with boolean values only
I appreciate your help in advance.
Upvotes: 1
Views: 3080
Reputation: 13175
If you print(is_filled = (df.isnull() & df_filled.notnull()) | df.notnull())
then you will see that you have a mixture of True
and NaN
. So the solution is to replace the NaN
values with False
:
The bottom snippet of code:
df=df.drop('date',1)
is_filled = (df.isnull() & df_filled.notnull()) | df.notnull()
is_filled = is_filled.fillna(False) # Fix here
df_filled[is_filled]=df_filled[is_filled].applymap(lambda x: '_2D' if pd.notnull(x) else np.nan)
Upvotes: 2