peter_c
peter_c

Reputation: 49

Remove all values below certain threshold and shift columns up in Pandas

I have growth data. I would like to calibrate all the columns to a certain (arbitrary) cutoff by removing all values below this threshold and "shift" the values up in each individual column.

To illustrate:

import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4],[5, 6]], columns=list('AB'))

result:

    A   B
0   1   2
1   3   4
2   5   6

Removing all values below 3:

df = df.where(df > 3, np.nan)

result:

    A   B
0   NaN NaN
1   NaN 4
2   5   6

What I'd finally want is the following dataframe (in a sense cutting and pasting values more than 3 to the top of the df):

    A   B
0   5   4
1   NaN 6
2   NaN NaN

Any idea how I would be able to do so?

Upvotes: 1

Views: 3571

Answers (2)

Daweo
Daweo

Reputation: 36570

I would do it by using built-in Python sorted following way:

import numpy as np
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 6],[5, 4]], columns=list('AB'))
df = df.where(df > 3, np.nan)
print(df)

Output:

     A    B
0  NaN  NaN
1  NaN  6.0
2  5.0  4.0

Then just do:

for col in df.columns:
    df[col] = sorted(df[col], key=pd.isnull)
print(df)

Output:

     A    B
0  5.0  6.0
1  NaN  4.0
2  NaN  NaN

I harness fact that built-in sorted is stable (note that I slightly changed input (6 before 4) to show that). isnull function produce False for all non-NaNs which is treated as 0 during sorting and True for rest which is treated as 1 during sorting.

Upvotes: 1

jezrael
jezrael

Reputation: 863056

Use justify function for improve performance:

df = pd.DataFrame([[1, 2], [3, 4],[5, 6]], columns=list('AB'))

df = df.where(df > 3, np.nan)
arr = justify(df.to_numpy(), invalid_val=np.nan, axis=0, side='up')
#oldier pandas versions
arr = justify(df.values, invalid_val=np.nan, axis=0, side='up')
df = pd.DataFrame(arr, columns=df.columns)
print (df)
     A    B
0  5.0  4.0
1  NaN  6.0
2  NaN  NaN

Function by divakar:

def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = ~np.isnan(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

Upvotes: 1

Related Questions