user1862963
user1862963

Reputation: 79

maskind dataframe elements in pandas

I have a square matrix as a dataframe and I want to project all the values above the diagonal into a series. My idea was to mask all elements below the diagonal and then dump the dataframe into a series but it dumps the NaN values as well. Here is the example:

users=[1,2,3,4,5]
cols=range(1,6)

matrix=pd.DataFrame(np.random.randn(len(users),len(cols)), index=users,columns=cols)
mask = np.ones(matrix.shape,dtype='bool')
mask[np.triu_indices(len(matrix))] = False
series=matrix.mask(mask).values.ravel()

Into the series I get all the upper tiangle values as well as all the NaN values corresponding to the lower triangle. Obviously I understood something wrong: I had understood that by masking some elements in a dataframe masked elements would not be used. Anybody know how I could do that?

Upvotes: 1

Views: 192

Answers (1)

piRSquared
piRSquared

Reputation: 294218

Option 1
Use pd.DataFrame.stack as it will dump np.nan for you.

matrix.mask(mask).stack().values

array([ 0.6022148 , -0.19275783, -0.54066832,  1.95690678,  0.23993172,
        0.27107843,  2.29409865, -0.70446894, -0.93153835, -0.26430007,
       -0.29887114,  1.83132652,  1.54226746,  0.50651577, -0.51001179])

Option 2
Use np.where to identify the locations within the mask

i, j = np.where(~mask)
matrix.values[i, j]

array([ 0.6022148 , -0.19275783, -0.54066832,  1.95690678,  0.23993172,
        0.27107843,  2.29409865, -0.70446894, -0.93153835, -0.26430007,
       -0.29887114,  1.83132652,  1.54226746,  0.50651577, -0.51001179])

Option 2B
Skip the mask and stick with the upper triangle.

i, j = np.triu_indices(len(matrix))
matrix.values[i, j]

array([ 0.6022148 , -0.19275783, -0.54066832,  1.95690678,  0.23993172,
        0.27107843,  2.29409865, -0.70446894, -0.93153835, -0.26430007,
       -0.29887114,  1.83132652,  1.54226746,  0.50651577, -0.51001179])

Upvotes: 1

Related Questions