Reputation: 578
Suppose I have a data.frame/tibble as under:
library(readr)
library(arrow)
# testFyl was originally read from a csv file with readr::read_csv()
testFyl <- structure(list(
BILL_NO = c("36/2015-16", "39/15-16", "771", "254", "731", "610", "200", "23 /2015-16", "21/2015-16", "30/15-16"),
BILL_DT_TIME = structure(c(1438021800, 1436898600, 1438021800, 1436293800, 1437935400, 1437589800, 1436207400, 1438108200, 1437676200, 1437330600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
BILL_DT = structure(c(16643, 16630, 16643, 16623, 16642, 16638, 16622, 16644, 16639, 16635), class = "Date")),
spec = structure(list(cols = list(BILL_NO = structure(list(), class = c("collector_character", "collector")), BILL_DT_TIME = structure(list(format = ""), class = c("collector_datetime", "collector")), BILL_DT = structure(list(format = ""), class = c("collector_date", "collector"))), default = structure(list(), class = c("collector_guess", "collector")), delim = ","), class = "col_spec"), row.names = c(NA, -10L), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"))
testFyl looks like:
# A tibble: 10 x 3
BILL_NO BILL_DT_TIME BILL_DT
<chr> <dttm> <date>
1 36/2015-16 2015-07-27 18:30:00 2015-07-27
2 39/15-16 2015-07-14 18:30:00 2015-07-14
3 771 2015-07-27 18:30:00 2015-07-27
4 254 2015-07-07 18:30:00 2015-07-07
5 731 2015-07-26 18:30:00 2015-07-26
6 610 2015-07-22 18:30:00 2015-07-22
7 200 2015-07-06 18:30:00 2015-07-06
8 23 /2015-16 2015-07-28 18:30:00 2015-07-28
9 21/2015-16 2015-07-23 18:30:00 2015-07-23
10 30/15-16 2015-07-19 18:30:00 2015-07-19
Note that the BILL_DT column has same dates as BILL_DT_TIME column with the time information removed.
Now, write this table in parquet
format with
write_parquet(testFyl, "testFyl.parquet")
While reading this parquet
file back into R with
read_parquet("testFyl.parquet")
everything is absolutely fine. The table is exactly same as above, as expected.
However, when I load this parquet file with the following two external parquet file viewing tools, they show dates in formats that I don't understand:
1. ParquetViewer from https://github.com/mukunku/ParquetViewer
Here, the BILL_DT_TIME column shows numbers which are strange to me.
2. Bigdata File Viewer from https://github.com/Eugene-Mark/bigdata-file-viewer
Here, BILL_DT_TIME as well as BILL_DT columns show numbers which I don't understand. These numbers show up when the data.frame
is saved with dput
function.
Seeing the date-time
(strange) and date
(understandable) columns in ParquetViewer, it seems that some formatting can be done to the date-time
column in R environment before exporting it in parquet
format so that it will show up properly in ParquetViewer. Can anyone help me figure it out?
Edit: Meanwhile, I've raised an issue (feature request) at github at https://github.com/mukunku/ParquetViewer/issues/40
Edit2: The developer has graciously updated ParquetViewer to show timestamps in human-intelligible format. So this issue is resolved.
Upvotes: 3
Views: 16850
Reputation: 1476
That format is called "timestamp". It's an Unix timestamp expressed in microseconds.
https://www.epochconverter.com/
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md
Current GUI viewer applications for those formats are quite limited.
Upvotes: 5