Anum Sheraz
Anum Sheraz

Reputation: 2625

Load only new data from stage into table with Snowpipe

I have a table in snowflake GAME.ALL_GAME_SAVES_RAW with allot of data already init. And inserted daily via snowpipe (upon new files landing in s3).

-- Stage
CREATE STAGE IF NOT EXISTS GAME.GAME_SAVES_STAGE
   URL = 's3://my-bucket/game-saves'
   STORAGE_INTEGRATION = MY_INTEGRATION
   FILE_FORMAT = (TYPE = PARQUET)
;

and snowpipe definition as follows;

CREATE PIPE IF NOT EXISTS GAME.SNOWPIPE_GAME_SAVES_XYZ
   AUTO_INGEST = TRUE
   AS
      COPY INTO GAME.ALL_GAME_SAVES_RAW
         FROM (SELECT $1:"screenName"::VARCHAR                                                AS screenName
                    , $1:"playtime"::NUMBER                                                   AS playtime
                      ...
                    , SPLIT_PART(SPLIT_PART(METADATA$FILENAME, 'game=', 2), '/', 1)::VARCHAR  AS game      
                    , SPLIT_PART(SPLIT_PART(METADATA$FILENAME, 'server=', 2), '/', 1)::NUMBER AS server
                    , SPLIT_PART(SPLIT_PART(METADATA$FILENAME, 'date=', 2), '/', 1)::DATE     AS date
                    , METADATA$FILENAME                                                       AS sys_filename
                    , CURRENT_TIMESTAMP                                                       AS sys_timestamp
                    , TRUE                                                                    AS sys_is_pipe_load
               FROM @GAME.GAME_SAVES_STAGE t)
         FILE_FORMAT = (TYPE = PARQUET);

This table only contains records for game XYZ. Because now we have more games and each game has different schema, so we decide to keep seperate tables for each game. First I clone the existing table;

create table GAME.XYZ_GAME_SAVES_RAW CLONE GAME.ALL_GAME_SAVES_RAW;
drop table GAME.ALL_GAME_SAVES_RAW;
drop pipe GAME.SNOWPIPE_GAME_SAVES_XYZ

Now I need to setup snowpipe to continue working on this new table GAME.XYZ_GAME_SAVES_RAW but do not worry about COPYing existing data from s3.

This is the new snowpipe definition I have (almost similar with stage location @GAME.GAME_SAVES_STAGE/game=xyz);

CREATE PIPE IF NOT EXISTS GAME.SNOWPIPE_GAME_SAVES_XYZ
   AUTO_INGEST = TRUE
   AS
      COPY INTO GAME.XYZ_GAME_SAVES_RAW
         FROM (SELECT $1:"screenName"::VARCHAR                                                AS screenName
                    , $1:"playtime"::NUMBER                                                   AS playtime
                      ...      
                    , SPLIT_PART(SPLIT_PART(METADATA$FILENAME, 'server=', 2), '/', 1)::NUMBER AS server
                    , SPLIT_PART(SPLIT_PART(METADATA$FILENAME, 'date=', 2), '/', 1)::DATE     AS date
                    , METADATA$FILENAME                                                       AS sys_filename
                    , CURRENT_TIMESTAMP                                                       AS sys_timestamp
                    , TRUE                                                                    AS sys_is_pipe_load
               FROM @GAME.GAME_SAVES_STAGE/game=xyz t)
         FILE_FORMAT = (TYPE = PARQUET);

Now the problem is, this pipe tries to load all previous data from stage location @GAME.GAME_SAVES_STAGE/game=xyz. Which I don't want. I already have the old data cloned from old table. Is there any way I can ask snowpipe to NOT load old data while creating pipe ? I tried AUTO_INGEST = FALSE fist and then ALTER SET AUTO_INGEST = TRUE but we cannot change AUTO_INGEST parameter on pipe once its created.

Upvotes: 0

Views: 173

Answers (0)

Related Questions