Reputation: 1349
I am able to convert a numpy-array column of type pandas timestamp
to an int array:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [pd.datetime(2019, 1, 11, 5, 30, 1), pd.datetime(2019, 1, 11, 5, 30, 1), pd.datetime(2019, 1, 11, 5, 30, 1)], 'b': [np.nan, 5.1, 1.6]})
a = df.to_numpy()
a
# array([[Timestamp('2019-01-11 05:30:01'), nan],
# [Timestamp('2019-01-11 05:30:01'), 5.1],
# [Timestamp('2019-01-11 05:30:01'), 1.6]], dtype=object)
a[:,0] = a[:,0].astype('datetime64').astype(np.int64)
# array([[1547184601000000, nan],
# [1547184601000000, 5.1],
# [1547184601000000, 1.6]], dtype=object)
For this array a, I would like to convert the column 0 back to a pandas timestamp. As the array is quite big and my overall process quite time consuming, I would like to avoid the usage of python loops, applys, lambdas or similar things. Instead, I am looking for speed optimized native numpy based functions etc.
I tried already things like:
a[:,0].astype('datetime64')
(result: ValueError: Converting an integer to a NumPy datetime requires a specified unit
)
and:
import calendar
calendar.timegm(a[:,0].utctimetuple())
(result: AttributeError: 'numpy.ndarray' object has no attribute 'utctimetuple'
)
How can I convert my column a[:,0]
back to
array([[Timestamp('2019-01-11 05:30:01'), nan],
[Timestamp('2019-01-11 05:30:01'), 5.1],
[Timestamp('2019-01-11 05:30:01'), 1.6]], dtype=object)
in a speed optimized way?
Upvotes: 0
Views: 1005
Reputation: 1781
Let's review docs
Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.
So, we can use DatetimeIndex
. and then covert it by using np.int64
.
In [18]: b = a[:,0]
In [19]: index = pd.DatetimeIndex(b)
In [21]: index.astype(np.int64)
Out[21]: Int64Index([1547184601000000000, 1547184601000000000, 1547184601000000000], dtype='int64')
Upvotes: 1