destinychoice
destinychoice

Reputation: 45

shift columns one step if column value is empty

Suppose I have a dataframe looking something like this:

  col1 col2 col3 col4
0    A    B    F    O
1    A         G    Q
2    A    C    G    P
3    A         H     
4    A    D    I     
5    A    D    I     
6    A         J    U
7    A    E         J

How can I shift the columns if the column value is empty?

  col1 col2  col3  col4
0    A    B     F     O
1    A    G     Q 
2    A    C     G     P
3    A    H  
4    A    D     I 
5    A    D     I 
6    A    J     U 
7    A    E     J 

I thought I could check current column, if it's empty, take the next column value and make that empty.

for col in df.columns:    
    df[col] = np.where((df[col] == ''), df[f'col{int(col[-1])+1}'], df[col])
    df[f'col{int(col[-1])+1}'] = np.where((df[col] == ''), '', df[col])

But I am failing somewhere. Sample df below.

df = pd.DataFrame(
    {
        'col1': ['A','A','A','A','A','A','A','A'],
        'col2': ['B','','C','','D','D','','E'],
        'col3': ['F','G','G','H','I','I','J',''],
        'col4': ['O','Q','P','','','','U','J']
    }
)

Upvotes: 1

Views: 1155

Answers (3)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Replace empty string with NaN

df = df.replace('', np.nan)

Apply dropna row-wise

odf = df.apply(lambda x: pd.Series(x.dropna().values), axis=1)

To retain column names,

odf.columns = df.columns

NOTE: It is always good to represent missing data with NaN

Output

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q  NaN
2    A    C    G    P
3    A    H  NaN  NaN
4    A    D    I  NaN
5    A    D    I  NaN
6    A    J    U  NaN
7    A    E    J  NaN

Upvotes: 3

ThePyGuy
ThePyGuy

Reputation: 18426

You can count the number of empty values for a column, then sort it, and finally get the desired datframe..

counts = {}
for col in df.columns.to_list():
    counts[col] = (df[col]== '').sum()  #Based on the example you have provided.
# Then sort the dictionary based on counts.
counts = dict(sorted(counts.items(), key=lambda item: item[1]))
#Assign back to the dataframe.
df = df[[*counts.keys()]]
df

  col1 col3 col2 col4
0    A    F    B    O
1    A    G         Q
2    A    G    C    P
3    A    H          
4    A    I    D     
5    A    I    D     
6    A    J         U
7    A         E    J

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150765

One way is to use np.argsort:

s = df.to_numpy()
orders = np.argsort(s=='', axis=1, kind='mergesort')

df[:] = s[np.arange(len(s))[:,None],orders]

Output:

  col1 col2 col3 col4
0    A    B    F    O
1    A    G    Q     
2    A    C    G    P
3    A    H          
4    A    D    I     
5    A    D    I     
6    A    J    U     
7    A    E    J     

Note: A very similar approach can be found in this question.

Upvotes: 2

Related Questions