Henrik K
Henrik K

Reputation: 1061

How to keep NaN in pivot table?

Looking to preserve NaN values when changing the shape of the dataframe.

These two questions may be related:

but not been able to use the answers provided - can I set a min count for np.sum somehow?

import pandas as pd
import numpy as np
df = pd.DataFrame([['Y1', np.nan], ['Y2', np.nan], ['Y1', 6], ['Y2',8]], columns=['A', 'B'], index=['1988-01-01','1988-01-01', '1988-01-04', '1988-01-04'])
df.index.name = 'Date'
df

pivot_df = pd.pivot_table(df, values='B', index=['Date'], columns=['A'],aggfunc=np.sum)
pivot_df

The output is:

A   Y1  Y2
Date        
1988-01-01  0.0 0.0
1988-01-04  6.0 8.0

and the desired output is:

A   Y1  Y2
Date        
1988-01-01  NaN NaN
1988-01-04  6.0 8.0

Upvotes: 6

Views: 12694

Answers (6)

Diego Palacios
Diego Palacios

Reputation: 1144

I also wanted to keep NaN values, and I also wanted to keep using the pivot_table function. So here is my solution:

pivot_df = pd.pivot_table(
    df,
    values='B',
    index=['Date'],
    columns=['A'],
    aggfunc=lambda x: x.sum(min_count=1),
    dropna=False
)

The downside is that this is less efficient in terms of computation time.

Upvotes: 0

skyseeker
skyseeker

Reputation: 5

Try to add 'dropna= False' to your pivot_table function?

Upvotes: -2

user3483203
user3483203

Reputation: 51155

If you have no duplicate entries, use set_index + unstack

df.set_index('A', append=True)['B'].unstack(-1)

A            Y1   Y2
Date
1988-01-01  NaN  NaN
1988-01-04  6.0  8.0

If you have duplicates, use a groupby with min_count

>> df

             A    B
Date
1988-01-01  Y1  NaN
1988-01-01  Y2  NaN
1988-01-04  Y1  6.0
1988-01-04  Y2  8.0
1988-01-01  Y1  NaN
1988-01-01  Y2  NaN
1988-01-04  Y1  6.0
1988-01-04  Y2  8.0

df.set_index('A', append=True).groupby(level=[0, 1])['B'].sum(min_count=1).unstack(-1)

A             Y1    Y2
Date
1988-01-01   NaN   NaN
1988-01-04  12.0  16.0

Upvotes: 1

Serge Ballesta
Serge Ballesta

Reputation: 148975

It is possible to count the values, and drop when 0 (or less than the expected count):

pivot_df = pd.pivot_table(df, values='B', index=['Date'], columns=['A'],
                          aggfunc=['sum','count'])

# build the mask from count
mask = (pivot_df.xs('count', axis=1) == 0)   # or ...<min_limit

#build the actual pivot_df from sum
pivot_df = pivot_df.xs('sum', axis=1)

# and reset to NaN when not enough values
pivot_df[mask] = np.nan

It gives as expected:

A            Y1   Y2
Date                
1988-01-01  NaN  NaN
1988-01-04  6.0  8.0

This one will give sensible result when you sum more than one value.

Upvotes: 1

Henrik K
Henrik K

Reputation: 1061

From the helpful comments the following solution meets my requirements:


pivot_df_2 = pd.pivot_table(df, values='B', index=['Date'], columns=['A'],aggfunc=min, dropna=False)
pivot_df_2

Values are supposed to be unique per slot so replacing the sum function with a min function shouldn't make a difference (in my case)

Upvotes: 3

Quang Hoang
Quang Hoang

Reputation: 150765

In this case, I would resolve by groupby:

(df.groupby(['Date', 'A']).B
   .apply(lambda x: np.nan if x.isna().all() else x.sum())
   .unstack('A')
)

output:

A            Y1   Y2
Date                
1988-01-01  NaN  NaN
1988-01-04  6.0  8.0

Change isna().all() to isna().any() if needed.

Upvotes: 1

Related Questions