Reputation: 630
I am using a multi-value pivot_table of this form:
pivot = df.pivot_table(index=[indices], columns=['column'], values=['start_value','end_value','delta','name','unit'], aggfunc='sum')
The dataframe df contains columns ['start_value','end_value','delta','name','unit'] all of dtype object. This is because 'name' & 'unit' are actually string columns, 'start_value', 'end_value' and 'delta' float columns. Object dtype is an attempt to make the pivot_table work, even though dtypes are different (content-wise).
When one of the values is non-nan, any nan value is converted to a 0, instead of a nan.
df:
indices, column, 'start_value','end_value','delta','name','unit'
A, '1nan', nan, 1000, nan, 'test', 'USD'
A, 'other', nan, nan, nan, 'test2', 'USD'
Results in pivot:
indices, ('1nan', 'start_value'), ('1nan', 'end_value'), ('1nan', 'delta'),('1nan', 'name'), ('1nan', 'unit'), ('other', 'start_value'), ('other', 'end_value'), ('other', 'delta'), ('other', 'name'), ('other', 'unit')
A, 0 [should be nan], 1000, 0 [should be nan], 'test','USD', nan, nan, nan, 'test2', 'USD'
Any suggestion on how to get nans instead of 0s?
Upvotes: 3
Views: 2984
Reputation: 11
Pass mysum()
function to the aggfunc parameter in pivot_table()
mysum = lambda x: sum(x)
The x
being passed to lambda is a pandas series, so another way to do this is as follows:
mysum = lambda x: x.sum(skipna=False)
Upvotes: 0
Reputation: 9
To get the result you want with a pivot table: You should populate first Nan values by 0:
df.fillna(0)
Then make your pivot_table.
Upvotes: 0
Reputation: 863301
Alternative solution is use GroupBy.sum
with parameter min_count=1
, but there are removed non numeric columns:
df = (df.groupby(['indices', 'column'])
['start_value','end_value','delta','name','unit']
.sum(min_count=1)
.unstack()
)
print (df)
start_value end_value delta
column '1nan' 'other' '1nan' 'other' '1nan' 'other'
indices
A NaN NaN 1000.0 NaN NaN NaN
because with pivot_table
are removed NaNs columns:
df = df.pivot_table(index=['indices'],
columns=['column'],
values=['start_value','end_value','delta','name','unit'],
aggfunc=lambda x: x.sum(min_count=1)
)
print (df)
end_value name unit
column '1nan' '1nan' 'other' '1nan' 'other'
indices
A 1000.0 'test' 'test2' 'USD' 'USD'
Upvotes: 2