Thomas Matthew
Thomas Matthew

Reputation: 2886

Set diagonal triangle in pandas DataFrame to NaN

Given the below dataframe:

import pandas as pd
import numpy as np
a = np.arange(16).reshape(4, 4)
df = pd.DataFrame(data=a, columns=['a','b','c','d'])

I'd like to produce the following result:

df([[ NaN,  1,  2,  3],
    [ NaN,  NaN,  6,  7],
    [ NaN,  NaN,  NaN, 11],
    [ NaN,  NaN,  NaN,  NaN]])

So far I've tried using np.tril_indicies, but it only works with a df turned back into a numpy array, and it only works for integer assignments (not np.nan):

il1 = np.tril_indices(4)
a[il1] = 0

gives:

array([[ 0,  1,  2,  3],
       [ 0,  0,  6,  7],
       [ 0,  0,  0, 11],
       [ 0,  0,  0,  0]])

...which is almost what I'm looking for, but barfs at assigning NaN:

ValueError: cannot convert float NaN to integer

while:

df[il1] = 0

gives:

TypeError: unhashable type: 'numpy.ndarray'

So if I want to fill the bottom triangle of a dataframe with NaN, does it 1) have to be a numpy array, or can I do this with pandas directly? And 2) Is there a way to fill bottom triangle with NaN rather than using numpy.fill_diagonal and incrementing the offset row by row down the whole DataFrame?

Another failed solution: Filling the diagonal of np array with zeros, then masking on zero and reassigning to np.nan. It converts zero values above the diagonal as NaN when they should be preserved as zero!

Upvotes: 5

Views: 3718

Answers (2)

Divakar
Divakar

Reputation: 221574

An approach using np.where -

m,n = df.shape
df[:] = np.where(np.arange(m)[:,None] >= np.arange(n),np.nan,df)

Sample run -

In [93]: df
Out[93]: 
    a   b   c   d
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15

In [94]: m,n = df.shape

In [95]: df[:] = np.where(np.arange(m)[:,None] >= np.arange(n),np.nan,df)

In [96]: df
Out[96]: 
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN

Upvotes: 6

jezrael
jezrael

Reputation: 862681

You need cast to float a, because type of NaN is float:

import numpy as np
a = np.arange(16).reshape(4, 4).astype(float)
print (a)
[[  0.   1.   2.   3.]
 [  4.   5.   6.   7.]
 [  8.   9.  10.  11.]
 [ 12.  13.  14.  15.]]


il1 = np.tril_indices(4)
a[il1] = np.nan
print (a)
[[ nan   1.   2.   3.]
 [ nan  nan   6.   7.]
 [ nan  nan  nan  11.]
 [ nan  nan  nan  nan]]

df = pd.DataFrame(data=a, columns=['a','b','c','d'])
print (df)
    a    b    c     d
0 NaN  1.0  2.0   3.0
1 NaN  NaN  6.0   7.0
2 NaN  NaN  NaN  11.0
3 NaN  NaN  NaN   NaN

Upvotes: 8

Related Questions