Thang Mai
Thang Mai

Reputation: 141

Athena - Is there any way to create table pointing for specific filename format?

I'm using Athena to query data from multiple files partitioned on S3. I create a

CREATE EXTERNAL TABLE IF NOT EXISTS testing_table (
     EventTime string,
     IpAddress string,
     Publisher string,
     Segmentname string,
     PlayDuration double,
     cost double ) PARTITIONED BY (
     year string,
     month string,
     day string ) ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\' LINES TERMINATED BY '\n' LOCATION 's3://campaigns/testing/'; 

In my location, the may have multiple files with different filename, such as: "campaign_au_click.csv", "campaign_au_impression.csv". These files may have different structure.

Is the any way that my above table only getting data from click files.

Thanks

Upvotes: 1

Views: 2995

Answers (2)

Dhaval
Dhaval

Reputation: 1076

The very first option should be to have those files in different folders. But considering that we have the situation right now and we want to query table for specific files. There is a work around.

You create your table with root folder only. But while querying you can have a WHERE clause on filename. The column name for filename is accessed by "$path" (including quotes).

For example, you query can be

SELECT .....
From .....
WHERE
.....
AND
"$path" like "%_click.csv"

Note : The where clause provided is just an example. You can explore regexp_like instead of like.

Upvotes: 2

Henry
Henry

Reputation: 1686

Your best bet is to partition them into different folders. Athena, like Hive, works on the folder level - any and all files in a folder will be taken in as the same schema.

Upvotes: 3

Related Questions