Reputation: 983
Let say I have a pyarrow table with a column Timestamp
containing float64
.
These floats are actually timestamps experessed in s.
For instance:
import pyarrow as pa
my_table = pa.table({'timestamp': pa.array([1600419000.477,1600419001.027])})
I read about Parquet Logical Type from documentation. Please, how can I convert these float values to the Logical Type TIMESTAMP? I see no documentation about the way to do this.
Thank you for your help. Have a good day, Bests,
Upvotes: 0
Views: 1223
Reputation: 139232
You will need to convert the floats into an actual timestamp type in pyarrow, and then it will automatically be written to a paruet logical timestamp type.
Using the pyarrow.compute
module, this conversion can also be done in pyarrow (a bit less ergonomic as doing the conversion in pandas, but avoiding a conversion to pandas and back):
>>> import pyarrow.compute as pc
>>> arr = pa.array([1600419000.477,1600419001.027])
>>> pc.multiply(arr, pa.scalar(1000.)).cast("int64").cast(pa.timestamp('ms'))
<pyarrow.lib.TimestampArray object at 0x7fe5ec3df588>
[
2020-09-18 08:50:00.477,
2020-09-18 08:50:01.027
]
Upvotes: 2
Reputation: 13932
I don't think you'll be able to convert within arrow from floats to timestamp.
Arrow assumes timestamp are 64 bit integers of a given precision (ms, us, ns). In your case you have to multiply your seconds floats by the precision you want (1000 for ms), then convert to int64 and cast into timestamp.
Here's an example using pandas:
(
pa.array([1600419000.477,1600419001.027])
.to_pandas()
.mul(1000)
.astype('long')
.pipe(pa.Array.from_pandas)
.cast(pa.timestamp('ms'))
)
Which gives you:
<pyarrow.lib.TimestampArray object at 0x7fb5025b6a08>
[
2020-09-18 08:50:00.477,
2020-09-18 08:50:01.027
]
Upvotes: 1