Behinoo
Behinoo

Reputation: 395

Pandas: Must pass DataFrame with boolean values only using as asfreq

I have the following code and it give me very strange error, my goal is to back fill the missing value for the data with different label. The error happens at this line df_filled[is_filled] if I change the df_filled=df.asfreq(freq='D').fillna(method='bfill', limit=1).dropna(how='all').drop_duplicates(keep='last') everything works fine but with this using freq=2D, df_filled[is_filled] doesn't have Boolean form.

    from datetime import datetime, timedelta
    import pandas as pd
    import numpy as np
    import random
    ##Generate the Data
    np.random.seed(11) 
    date_today = datetime.now()
    ndays = 15
    df = pd.DataFrame({'date': [date_today + timedelta(days=(abs(np.random.randn(1))*2)[0]*x) for x in range(ndays)], 
                       'test': pd.Series(np.random.randn(ndays)),     'test2':pd.Series(np.random.randn(ndays))})
    df1=pd.DataFrame({'date': [date_today + timedelta(hours=x) for x in range(ndays)], 
                       'test': pd.Series(np.random.randn(ndays)),     'test2':pd.Series(np.random.randn(ndays))})
    df2=pd.DataFrame({'date': [date_today + timedelta(days=x)-timedelta(seconds=100*x) for x in range(ndays)], 
                       'test': pd.Series(np.random.randn(ndays)),     'test2':pd.Series(np.random.randn(ndays))})
    df=df.append(df1)
    df=df.append(df2)
    df = df.set_index('date').sort_index()
    df = df.mask(np.random.random(df.shape) < .7)
    df=df.reset_index()
    df['test']=df['test'].astype(str)
    df['test2']=df['test2'].astype(str)
    df.replace('nan', np.nan, inplace = True)
    ##

    df.set_index(df['date'].dt.date, inplace = True) 

    df = df[~df.index.duplicated(keep='first')]
    df_filled=df.asfreq(freq='2D').fillna(method='bfill', limit=2).dropna(how='all').drop_duplicates(keep='last')
    df_filled.set_index(df_filled['date'],inplace=True)
    df_filled=df_filled.drop('date',1)
    df.set_index(df['date'],inplace=True)
    df=df.drop('date',1)
    is_filled = (df.isnull() & df_filled.notnull()) | df.notnull() 
    df_filled[is_filled] ## error happens here
    df_filled[is_filled]=df_filled[is_filled].applymap(lambda x: '_2D' if pd.notnull(x)  else np.nan)

output: ValueError: Must pass DataFrame with boolean values only

I appreciate your help in advance.

Upvotes: 1

Views: 3080

Answers (1)

roganjosh
roganjosh

Reputation: 13175

If you print(is_filled = (df.isnull() & df_filled.notnull()) | df.notnull()) then you will see that you have a mixture of True and NaN. So the solution is to replace the NaN values with False:

The bottom snippet of code:

df=df.drop('date',1)
is_filled = (df.isnull() & df_filled.notnull()) | df.notnull() 
is_filled = is_filled.fillna(False) # Fix here
df_filled[is_filled]=df_filled[is_filled].applymap(lambda x: '_2D' if pd.notnull(x)  else np.nan)

Upvotes: 2

Related Questions