Reputation: 49
I have growth data. I would like to calibrate all the columns to a certain (arbitrary) cutoff by removing all values below this threshold and "shift" the values up in each individual column.
To illustrate:
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4],[5, 6]], columns=list('AB'))
result:
A B
0 1 2
1 3 4
2 5 6
Removing all values below 3:
df = df.where(df > 3, np.nan)
result:
A B
0 NaN NaN
1 NaN 4
2 5 6
What I'd finally want is the following dataframe (in a sense cutting and pasting values more than 3 to the top of the df):
A B
0 5 4
1 NaN 6
2 NaN NaN
Any idea how I would be able to do so?
Upvotes: 1
Views: 3571
Reputation: 36570
I would do it by using built-in Python sorted
following way:
import numpy as np
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 6],[5, 4]], columns=list('AB'))
df = df.where(df > 3, np.nan)
print(df)
Output:
A B
0 NaN NaN
1 NaN 6.0
2 5.0 4.0
Then just do:
for col in df.columns:
df[col] = sorted(df[col], key=pd.isnull)
print(df)
Output:
A B
0 5.0 6.0
1 NaN 4.0
2 NaN NaN
I harness fact that built-in sorted
is stable (note that I slightly changed input (6
before 4
) to show that). isnull
function produce False
for all non-NaN
s which is treated as 0
during sorting and True
for rest which is treated as 1
during sorting.
Upvotes: 1
Reputation: 863056
Use justify
function for improve performance:
df = pd.DataFrame([[1, 2], [3, 4],[5, 6]], columns=list('AB'))
df = df.where(df > 3, np.nan)
arr = justify(df.to_numpy(), invalid_val=np.nan, axis=0, side='up')
#oldier pandas versions
arr = justify(df.values, invalid_val=np.nan, axis=0, side='up')
df = pd.DataFrame(arr, columns=df.columns)
print (df)
A B
0 5.0 4.0
1 NaN 6.0
2 NaN NaN
Function by divakar:
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = ~np.isnan(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
Upvotes: 1