A H
A H

Reputation: 2570

pandas shift rows NaNs

Say we have a dataframe set up as follows:

x = pd.DataFrame(np.random.randint(1, 10, 30).reshape(5,6),
                 columns=[f'col{i}' for i in range(6)])
x['col6'] = np.nan
x['col7'] = np.nan

    col0    col1    col2    col3    col4    col5    col6    col7
 0   6       5        1       5       2       4      NaN    NaN
 1   8       8        9       6       7       2      NaN    NaN
 2   8       3        9       6       6       6      NaN    NaN
 3   8       4        4       4       8       9      NaN    NaN
 4   5       3        4       3       8       7      NaN    NaN     

When calling x.shift(2, axis=1), col2 -> col5 shifts correctly, but col6 and col7 stays as NaN? How can I overwrite the NaN in col6 and col7 values with col4 and col5's values? Is this a bug or intended?

    col0    col1    col2    col3    col4    col5    col6    col7
0   NaN      NaN    6.0     5.0     1.0      5.0    NaN     NaN
1   NaN      NaN    8.0     8.0     9.0      6.0    NaN     NaN
2   NaN      NaN    8.0     3.0     9.0      6.0    NaN     NaN
3   NaN      NaN    8.0     4.0     4.0      4.0    NaN     NaN
4   NaN      NaN    5.0     3.0     4.0      3.0    NaN     NaN

Upvotes: 6

Views: 3423

Answers (1)

EdChum
EdChum

Reputation: 394129

It's possible this is a bug, you can use np.roll to achieve this:

In[11]:
x.apply(lambda x: np.roll(x, 2), axis=1)

Out[11]: 
   col0  col1  col2  col3  col4  col5  col6  col7
0   NaN   NaN   6.0   5.0   1.0   5.0   2.0   4.0
1   NaN   NaN   8.0   8.0   9.0   6.0   7.0   2.0
2   NaN   NaN   8.0   3.0   9.0   6.0   6.0   6.0
3   NaN   NaN   8.0   4.0   4.0   4.0   8.0   9.0
4   NaN   NaN   5.0   3.0   4.0   3.0   8.0   7.0

Speedwise, it's probably quicker to construct a df and reuse the existing columns and pass the result of np.roll as the data arg to the constructor to DataFrame:

In[12]:
x = pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
x

Out[12]: 
   col0  col1  col2  col3  col4  col5  col6  col7
0   NaN   NaN   6.0   5.0   1.0   5.0   2.0   4.0
1   NaN   NaN   8.0   8.0   9.0   6.0   7.0   2.0
2   NaN   NaN   8.0   3.0   9.0   6.0   6.0   6.0
3   NaN   NaN   8.0   4.0   4.0   4.0   8.0   9.0
4   NaN   NaN   5.0   3.0   4.0   3.0   8.0   7.0

timings

In[13]:

%timeit pd.DataFrame(np.roll(x, 2, axis=1), columns = x.columns)
%timeit x.fillna(0).astype(int).shift(2, axis=1)

10000 loops, best of 3: 117 µs per loop
1000 loops, best of 3: 418 µs per loop

So constructing a new df with the result of np.roll is quicker than first filling the NaN values, cast to int, and then shifting.

Upvotes: 4

Related Questions