Reputation: 505
I am experiencing something really weird, not sure if it is a bug (hopefully not). Anyway, when I perform DataFrame.shift
method by columns, the columns either shifted incorrectly or the values returned incorrect (see output below).
Does anyone know if I am missing something or it is simply a bug with the library.
# Example 2
ind = pd.date_range('01 / 01 / 2019', periods=5, freq='12H')
df2 = pd.DataFrame({"A": [1, 2, 3, 4, 5],
"B": [10, 20, np.nan, 40, 50],
"C": [11, 22, 33, np.nan, 55],
"D": [-11, -24, -51, -36, -2],
'D1': [False] * 5,
'E': [True, False, False, True, True]},
index=ind)
df2.shift(freq='12H', periods=1, axis=1)
df2.shift(periods=1, axis=1)
print(df2.shift(periods=1, axis=1)) # shift by column -> incorrect
# print(df2.shift(periods=1, axis=0)) # correct
Output:
A B C D D1 E
2019-01-01 00:00:00 1 10.0 11.0 -11 False True
2019-01-01 12:00:00 2 20.0 22.0 -24 False False
2019-01-02 00:00:00 3 NaN 33.0 -51 False False
2019-01-02 12:00:00 4 40.0 NaN -36 False True
2019-01-03 00:00:00 5 50.0 55.0 -2 False True
A B C D D1 E
2019-01-01 00:00:00 NaN NaN 10.0 1.0 NaN False
2019-01-01 12:00:00 NaN NaN 20.0 2.0 NaN False
2019-01-02 00:00:00 NaN NaN NaN 3.0 NaN False
2019-01-02 12:00:00 NaN NaN 40.0 4.0 NaN False
2019-01-03 00:00:00 NaN NaN 50.0 5.0 NaN False
[Finished in 0.4s]
Upvotes: 1
Views: 804
Reputation: 862641
You are right, it is bug, problem is DataFrame.shift
with axis=1
shifts object columns to the next column with same dtype.
In sample columns A
and D
are filled by integers so A
is moved to D
, columns B
and C
are filled by floats, so B
is moved to C
and similar in boolean D1
and E
columns.
Solution should be convert all columns to objects, shift and then use DataFrame.infer_objects
:
df3 = df2.astype(object).shift(1, axis=1).infer_objects()
print (df3)
A B C D D1 E
2019-01-01 00:00:00 NaN 1 10.0 11.0 -11 False
2019-01-01 12:00:00 NaN 2 20.0 22.0 -24 False
2019-01-02 00:00:00 NaN 3 NaN 33.0 -51 False
2019-01-02 12:00:00 NaN 4 40.0 NaN -36 False
2019-01-03 00:00:00 NaN 5 50.0 55.0 -2 False
print (df3.dtypes)
A float64
B int64
C float64
D float64
D1 int64
E bool
dtype: object
If use shift
with axis=0
then dtypes are always same, so working correctly.
Upvotes: 2