Creating table over written parquet file in Athena

Question

I am creating a parquet file from a CSV file using the following python script:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

csv_file = '~/Desktop/SWA_UK_Pickup_Forecast_HOURLY_M1_at_2017-11-28-04_20_21-UTC_from_28-Nov-2017_to_28-Nov-2017.csv'

chunksize = 10

csv_stream = pd.read_csv(csv_file, sep=',', chunksize=chunksize, low_memory=False)

for i, chunk in enumerate(csv_stream):
    if i == 0:
        parquet_schema = pa.Table.from_pandas(df=chunk).schema
        parquet_writer = pq.ParquetWriter('/Users/kohujwal/Desktop/parquet_forecast_file.parquet', parquet_schema, compression='snappy')
    table = pa.Table.from_pandas(chunk, schema=parquet_schema);
    parquet_writer.write_table(table)

parquet_writer.close();

Then I am manually uploading it to some s3 location, and creating a table over it in Athena. The query used to create the table:

CREATE EXTERNAL TABLE IF NOT EXISTS forecast_report_lom_parquet (
  `forecast_week` int,
  `for_date` string,
  `forecast_day_of_week` string,
  `merchant_id` string,
  `shipper_account_id` string,
  `shipper_name` string,
  `node_id` string,
  `routing_preference` string,
  `forecast_shipment_count` int,
  `forecast_pallet_count` int,
  `forecast_volume` double )
STORED AS PARQUET
LOCATION 's3://prevoir-athena-input-prod/spike-computation/forecast-reports/parquet/live-order-model/'
tblproperties ("parquet.compress"="SNAPPY")

The table is created successfully, however when I try to query the table, it doesn't display the contents. It just shows the indexes of the rows and the column names. Here is the CSV and the parquet file. The query on the parquet table :

SELECT * FROM prevoir_prod.forecast_report_lom_parquet

Results that I am seeing on the console.

Can anyone point out what's going wrong here?

Creating table over written parquet file in Athena

Answers (1)

Related Questions