Reputation: 816
I am porting a python project (s3 + Athena) from using csv to parquet.
I can make the parquet file, which can be viewed by Parquet View.
I can upload the file to s3 bucket.
I can create the Athena table pointing to the s3 bucket.
However, when I query the table at Athena Web GUI, it runs for 10 mins (it seems that it will never stop) and there is no result shown.
The whole project is complicated. I try to simplify the case.
1.Let say, we have the following csv file (test.csv)
"col1","col2"
"A","B"
2.Then, I use the following python (2.7) code to convert it to parquet file (test.parquet)
import fastparquet
import pandas as pd
df = pd.read_csv(r"test.csv")
fastparquet.write(r"test.parquet", df, compression="GZIP")
3.Upload the test.parquet to s3 bucket folder "abc_bucket/abc_folder" by the s3 Web GUI
4.Create the following table by Athena Web GUI
CREATE EXTERNAL TABLE IF NOT EXISTS abc_folder (
`col1` string,
`col2` string)
STORED AS PARQUET
LOCATION 's3://abc_bucket/abc_folder/'
TBLPROPERTIES (
"parquet.compress"="GZIP"
);
5.Finally, run the following SQL at Athena. The SQL runs for 10 mins and seems forever.
select *
from abc_folder;
My question is which step above is wrong so that I cannot query the table from Athena.
It is highly appreciated for any help.
Upvotes: 1
Views: 19372
Reputation: 387
We can read parquet file in athena by creating a table for given s3 location.
CREATE EXTERNAL TABLE abc_new_table (
dayofweek INT,
flightdate STRING,
uniquecarrier STRING,
airlineid INT
)
PARTITIONED BY (flightdate STRING)
STORED AS PARQUET
LOCATION 's3://abc_bucket/abc_folder/'
tblproperties ("parquet.compression"="SNAPPY");
This assumes s3://abc_bucket/abc_folder/*
directory has the parquet files compressed in SNAPPY format.
More details can be found in this AWS document.
Upvotes: 3
Reputation: 846
Try to view your parquet data in S3 bucket itself with "Select From" option. If its fine, then use Athena to create table of your parquet file with proper table column headers. Later preview table to view the content.
Upvotes: 0