Nick Ponomar
Nick Ponomar

Reputation: 369

pyarrow.csv read out timestamp or convert string to timestamp

I have a csv file with the following content:

Timestamp (UTC),Temperature,
7/6/2021 8:05:00 PM,78.40,

and I am creating a parquet file out of it. Unfotunatelly end type of "Timestamp (UTC)" ends up always as VARCHAR (which is a string in Python), but I need TIMESTAMPMILLI Here is a script, that I use:

import pyarrow.csv as pv
import pyarrow.parquet as pq
import pyarrow as pa
...
table = pv.read_csv(csv, convert_options=pv.ConvertOptions(timestamp_parsers={"%-m/%-d/%Y %-I:%M:%S %p"}))
timestamps = table[0]
...
table.append(pa.table([timestamps, units], names=["time", "unit"]))
pq.write_table(pa.concat_tables(table), f'./{device_id}.parquet')

My timestamps are strings, I can see it in debug view. What would be a proper way to convert it to datetime? What python format would actually end up as TIMESTAMPMILLI in a parqet file eventually?

Upvotes: 0

Views: 1072

Answers (1)

Nick Ponomar
Nick Ponomar

Reputation: 369

Solution would be to use compute from pyarrow:

import pyarrow.compute as pc
...
timestamps = pc.strptime(table[0], format='%m/%d/%Y %H:%M:%S %p', unit='s')

this results into the following data structure in parquet file:

optional int64 field_id=-1 time (Timestamp(isAdjustedToUTC=false, timeUnit=milliseconds, is_from_converted_type=false, force_set_converted_type=false));

Upvotes: 1

Related Questions