Jie Jenn
Jie Jenn

Reputation: 505

Pandas DataFrame.Shift returns incorrect result when shift by column axis

I am experiencing something really weird, not sure if it is a bug (hopefully not). Anyway, when I perform DataFrame.shift method by columns, the columns either shifted incorrectly or the values returned incorrect (see output below).

Does anyone know if I am missing something or it is simply a bug with the library.

# Example 2
ind = pd.date_range('01 / 01 / 2019', periods=5, freq='12H')
df2 = pd.DataFrame({"A": [1, 2, 3, 4, 5],
                   "B": [10, 20, np.nan, 40, 50],
                   "C": [11, 22, 33, np.nan, 55],
                   "D": [-11, -24, -51, -36, -2],
                   'D1': [False] * 5,
                   'E': [True, False, False, True, True]},
                  index=ind)

df2.shift(freq='12H', periods=1, axis=1)
df2.shift(periods=1, axis=1)

print(df2.shift(periods=1, axis=1)) # shift by column -> incorrect
# print(df2.shift(periods=1, axis=0)) # correct

Output:

                     A     B     C   D     D1      E
2019-01-01 00:00:00  1  10.0  11.0 -11  False   True
2019-01-01 12:00:00  2  20.0  22.0 -24  False  False
2019-01-02 00:00:00  3   NaN  33.0 -51  False  False
2019-01-02 12:00:00  4  40.0   NaN -36  False   True
2019-01-03 00:00:00  5  50.0  55.0  -2  False   True

                      A   B     C    D   D1      E
2019-01-01 00:00:00 NaN NaN  10.0  1.0  NaN  False
2019-01-01 12:00:00 NaN NaN  20.0  2.0  NaN  False
2019-01-02 00:00:00 NaN NaN   NaN  3.0  NaN  False
2019-01-02 12:00:00 NaN NaN  40.0  4.0  NaN  False
2019-01-03 00:00:00 NaN NaN  50.0  5.0  NaN  False
[Finished in 0.4s]

Upvotes: 1

Views: 804

Answers (1)

jezrael
jezrael

Reputation: 862641

You are right, it is bug, problem is DataFrame.shift with axis=1 shifts object columns to the next column with same dtype.

In sample columns A and D are filled by integers so A is moved to D, columns B and C are filled by floats, so B is moved to C and similar in boolean D1 and E columns.

Solution should be convert all columns to objects, shift and then use DataFrame.infer_objects:

df3 = df2.astype(object).shift(1, axis=1).infer_objects()
print (df3)
                      A  B     C     D  D1      E
2019-01-01 00:00:00 NaN  1  10.0  11.0 -11  False
2019-01-01 12:00:00 NaN  2  20.0  22.0 -24  False
2019-01-02 00:00:00 NaN  3   NaN  33.0 -51  False
2019-01-02 12:00:00 NaN  4  40.0   NaN -36  False
2019-01-03 00:00:00 NaN  5  50.0  55.0  -2  False

print (df3.dtypes)
A     float64
B       int64
C     float64
D     float64
D1      int64
E        bool
dtype: object

If use shift with axis=0 then dtypes are always same, so working correctly.

Upvotes: 2

Related Questions