Reputation: 369
I have a csv file with the following content:
Timestamp (UTC),Temperature,
7/6/2021 8:05:00 PM,78.40,
and I am creating a parquet file out of it. Unfotunatelly end type of "Timestamp (UTC)" ends up always as VARCHAR
(which is a string in Python), but I need TIMESTAMPMILLI
Here is a script, that I use:
import pyarrow.csv as pv
import pyarrow.parquet as pq
import pyarrow as pa
...
table = pv.read_csv(csv, convert_options=pv.ConvertOptions(timestamp_parsers={"%-m/%-d/%Y %-I:%M:%S %p"}))
timestamps = table[0]
...
table.append(pa.table([timestamps, units], names=["time", "unit"]))
pq.write_table(pa.concat_tables(table), f'./{device_id}.parquet')
My timestamps
are strings, I can see it in debug view. What would be a proper way to convert it to datetime? What python format would actually end up as TIMESTAMPMILLI
in a parqet file eventually?
Upvotes: 0
Views: 1072
Reputation: 369
Solution would be to use compute from pyarrow:
import pyarrow.compute as pc
...
timestamps = pc.strptime(table[0], format='%m/%d/%Y %H:%M:%S %p', unit='s')
this results into the following data structure in parquet file:
optional int64 field_id=-1 time (Timestamp(isAdjustedToUTC=false, timeUnit=milliseconds, is_from_converted_type=false, force_set_converted_type=false));
Upvotes: 1