Georg Heiler
Georg Heiler

Reputation: 17676

left shift all the columns in a pandas dataframe

I have a dataframe in the following format: enter image description here

How can I move any existing values to the left (i.e. left shift the columns for each row removing any NaN values / right-shifiting the NaN values?

The desired result would be similar to:

id,level_1__value,level_2__value,level_3__value,last_not_null
1,1,nan,nan,2
2,5,nan,nan,5
3,3,5,nan,6
4,7,2,2
5,3,nan,3
...

Below, you find the code which defines the dataframe above:

    import pandas as pd
import numpy as np
from numpy import nan

df = pd.DataFrame({'id': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9}, 'level_1__value': {0: 1.0, 1: nan, 2: 3.0, 3: 4.0, 4: 5.0, 5: nan, 6: 7.0, 7: nan, 8: 34.0}, 'level_2__value': {0: nan, 1: nan, 2: 5.0, 3: 7.0, 4: nan, 5: nan, 6: nan, 7: nan, 8: nan}, 'level_3__value': {0: nan, 1: 5.0, 2: nan, 3: 2.0, 4: 3.0, 5: nan, 6: nan, 7: 6.0, 8: nan}, 'last_not_null': {0: 2.0, 1: 5.0, 2: 6.0, 3: 2.0, 4: 3.0, 5: 3.0, 6: 10.0, 7: 6.0, 8: 34.0}})
display(df)

Upvotes: 0

Views: 215

Answers (1)

jezrael
jezrael

Reputation: 862661

You can use custom function with Series.dropna, convert to numpy array and add Series.reindex for edge state - if all rows has at east one NaN, so output will not match by input columns length:

c = ['level_1__value', 'level_2__value', 'level_3__value']
f = lambda x: pd.Series(x.dropna().to_numpy()).reindex(range(len(c)))
df[c] = df[c].apply(f, axis=1)
print (df)
   id  level_1__value  level_2__value  level_3__value  last_not_null
0   1             1.0             NaN             NaN            2.0
1   2             5.0             NaN             NaN            5.0
2   3             3.0             5.0             NaN            6.0
3   4             4.0             7.0             2.0            2.0
4   5             5.0             3.0             NaN            3.0
5   6             NaN             NaN             NaN            3.0
6   7             7.0             NaN             NaN           10.0
7   8             6.0             NaN             NaN            6.0
8   9            34.0             NaN             NaN           34.0

If performance is important use divakar function:

#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = ~np.isnan(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

c = ['level_1__value', 'level_2__value', 'level_3__value']
df[c] = justify(df[c].to_numpy(),invalid_val=np.nan )
print (df)
   id  level_1__value  level_2__value  level_3__value  last_not_null
0   1             1.0             NaN             NaN            2.0
1   2             5.0             NaN             NaN            5.0
2   3             3.0             5.0             NaN            6.0
3   4             4.0             7.0             2.0            2.0
4   5             5.0             3.0             NaN            3.0
5   6             NaN             NaN             NaN            3.0
6   7             7.0             NaN             NaN           10.0
7   8             6.0             NaN             NaN            6.0
8   9            34.0             NaN             NaN           34.0

Upvotes: 1

Related Questions