Reputation: 371
I have loaded the File data from Azure blob storage to Azure SQL DW external table through poly-base. Now the File present in Blob container has been updated. Now I want to load the fresh data. Can any one suggest how the fresh data can be loaded to external table through poly base.? I am trying a way to drop the external table if exists and create it again to load the fresh data.
Upvotes: 3
Views: 8257
Reputation: 14389
There is no need to drop external tables to view new data. However you can use the DROP EXTERNAL TABLE
syntax to drop Polybase / external tables if required, eg to change the definition or REJECT_TYPE
. You can also check DMV sys.external_tables
for their existence prior to dropping them, eg
IF EXISTS ( SELECT * FROM sys.external_tables WHERE object_id = OBJECT_ID('yourSchema.yourTable') )
DROP EXTERNAL TABLE yourSchema.yourTable
GO
Azure SQL Data Warehouse does not yet support the DROP IF EXISTS (DIE
) syntax as in SQL Server 2016. However as mentioned there should be no need to drop external tables just to view new data. If the blob file has been updated then new data will simply appear in the external table next time you query it.
Another approach is to supply a directory name in your external table definition. Then by simply dropping new files in the folder, the data will appear next time you query the table, eg
CREATE EXTERNAL TABLE dbo.DimDate2External (
DateId INT NOT NULL,
CalendarQuarter TINYINT NOT NULL,
FiscalQuarter TINYINT NOT NULL
)
WITH (
LOCATION='/textfiles/dimDate/',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile
);
So say if you had an initial file in that folder called DimDate1.txt
and then added a new file called DimDate2.txt
it would appear in the table as one.
Upvotes: 8
Reputation: 41
if you have created an External table as specified in https://msdn.microsoft.com/en-us/library/dn935021.aspx, then there shouldn't be nothing to do.
The external table will be a "pointer" to your file, and every time you query the table, data will be read from the original file. This way if you update the file there are no actions to take on Azure SQL DW.
If you have imported your data IN Azure SQL DW using CREATE TABLE AS SELECT syntax (see https://msdn.microsoft.com/en-us/library/mt204041.aspx), reading from an external table, you will need to drop the table, but not the external one, as the above applies here as well, and when you query the external table the updated file will be read.
So:
--creating an external table (using defined external data source and file format):
CREATE EXTERNAL TABLE ClickStream (
url varchar(50),
event_date date,
user_IP varchar(50)
)
WITH (
LOCATION='/webdata/employee.tbl',
DATA_SOURCE = mydatasource,
FILE_FORMAT = myfileformat
)
;
When you select from ClickStream it will always read content from /webdata/employee.tbl file. If you only update the employee.tbl file with new data, there are no actions to take.
Instead:
--Use CREATE TABLE AS SELECT to import the Azure blob storage data into a new
--SQL Data Warehouse table called ClickStreamData
CREATE TABLE ClickStreamData
WITH
(
CLUSTERED COLUMNSTORE INDEX,
DISTRIBUTION = HASH (user_IP)
)
AS SELECT * FROM ClickStream
;
data will be copied to ClickStreamData table in the instance, and updates to file will not be reflected. In this case you will need to drop ClickStreamData and re-create it, but you can still use ClickStream as the source, as that external table will read data from the updated file.
Upvotes: 2