Reputation: 2923
I have a dataframe with some NaN values in my s_x
columns. If NaN
values exist in them, I want them to be in the last columns.
Example: Given values in the s_x
columns of [Nan, 1, Nan, 2]
I want the values to shift left over the columns to result in [1, 2, NaN, NaN]
Example 2:
My current solution is very slow as I:
s_x
values into a listNaN
valuesNaN
valuesHow can I improve on the function below? The order of values (low to high) needs to remain the same. Every value is found only once in the s_x
columns of a row.
I know that "leaving the pandas-logic" by parsing to a list and back is problematic concerning performance and was thinking of trying to do it with a lambda function, but didn't get anywhere with it.
My current code as a minimal working example:
import pandas as pd
import numpy as np
def shift_values(df, leading_chars):
"""Shifts all values in columns with common leading chars to the left if there are NaN values.
Example: Given a row of [NaN, 1, NaN, 2]
the values are shifted to [1, 2, NaN, NaN]
"""
cols = [c for c in list(df.columns) if c[:len(leading_chars)] == leading_chars]
for index, row in df.iterrows():
# create list without NaN values
values = [v for v in row[cols] if not pd.isna(v)]
# pad with NaN to get correct number of values again
values += [np.nan] * (len(cols) - len(values))
# overwrite row values with modified list
for i, c in enumerate(cols):
row[c] = values[i]
# overwrite row in the dataframe
df.iloc[index] = row
return df
mylist = [["key", "s_1", "s_2", "s_3", "s_4"],
[1, np.nan, 1, 2, np.nan],
[1, 10, 20, 25, np.nan],
[1, 10, np.nan, 25, np.nan]
]
df = pd.DataFrame(mylist[1:], columns=mylist[0])
print("______ PREVIOUS ______")
print(df.head())
df = shift_values(df, 's_')
print("______ RESULT ______")
print(df.head())
Upvotes: 0
Views: 62
Reputation: 195478
Try:
df = df.transform(sorted, key=pd.isna, axis=1)
print(df)
Prints:
key s_1 s_2 s_3 s_4
0 1.0 1.0 2.0 NaN NaN
1 1.0 10.0 20.0 25.0 NaN
2 1.0 10.0 25.0 NaN NaN
EDIT: If columns are not next to each other:
x = df.filter(regex=r"^s_")
df.loc[:, x.columns] = df.loc[:, x.columns].transform(
sorted, key=pd.isna, axis=1
)
print(df)
Upvotes: 3
Reputation: 862911
For improve performance use justify
only with selected columns:
#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):
"""
Justifies a 2D array
Parameters
----------
A : ndarray
Input array to be justified
axis : int
Axis along which justification is to be made
side : str
Direction of justification. It could be 'left', 'right', 'up', 'down'
It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.
"""
if invalid_val is np.nan:
mask = ~np.isnan(a)
else:
mask = a!=invalid_val
justified_mask = np.sort(mask,axis=axis)
if (side=='up') | (side=='left'):
justified_mask = np.flip(justified_mask,axis=axis)
out = np.full(a.shape, invalid_val)
if axis==1:
out[justified_mask] = a[mask]
else:
out.T[justified_mask.T] = a.T[mask.T]
return out
def shift_values(df, leading_chars):
"""Shifts all values in columns with common leading chars to the
left if there are NaN values.
Example: Given a row of [NaN, 1, NaN, 2]
the values are shifted to [1, 2, NaN, NaN]
"""
cols = df.columns[df.columns.str.startswith(leading_chars)]
df[cols] = justify(df[cols].to_numpy(), invalid_val=np.nan, axis=1, side='left')
return df
df = shift_values(df, 's_')
print("______ RESULT ______")
print(df.head())
______ RESULT ______
key s_1 s_2 s_3 s_4
0 1 1.0 2.0 NaN NaN
1 1 10.0 20.0 25.0 NaN
2 1 10.0 25.0 NaN NaN
Upvotes: 1