Reputation: 31
I'm writing a parquet file to an azure data lake storage system, via databricks. I'm using the following to write the parquet file:
cultural_split.write.partitionBy("filename").mode('overwrite').parquet("/mnt/crime/transformed-data")
I'm then creating an external table using the parquet file, and I can query the table using SQL in synapse.After roughly 30 minutes I then get this error while trying to query the table:
External table 'dbo.cultural_data' is not accessible because location does not exist or it is used by another process.
I then check my data lake and can see that the parquet file is no longer there. If I check the 'show' deleted objects, I still can't see the file.
Is there something that I'm missing while creating the external table? I have soft deletes turned off for this blob storage and I can't think of any other process that might be deleting this file.
Here's the code I use to create my external table:
IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseParquetFormat')
CREATE EXTERNAL FILE FORMAT [SynapseParquetFormat]
WITH ( FORMAT_TYPE = PARQUET)
GO
IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = 'crime-data_crimedatasam_dfs_core_windows_net')
CREATE EXTERNAL DATA SOURCE [crime-data_crimedatasam_dfs_core_windows_net]
WITH (
LOCATION = 'abfss://[email protected]'
)
GO
CREATE EXTERNAL TABLE dbo.crime_data (
[ethnicity] nvarchar(80),
[police_district] nvarchar(80),
[lower_age] int,
[upper_age] int
)
WITH (
LOCATION = 'transformed-data/filename=crime_data/part-00000-tid-3588581976832316109-20e2477d-82c0-47d9-ba75-cf341910855e-20-1.c000.snappy.parquet',
DATA_SOURCE = [crime-data_sam_dfs_core_windows_net],
FILE_FORMAT = [SynapseParquetFormat]
)
GO
SELECT TOP 100 * FROM dbo.crime_data
GO
Many thanks,
Sam
Upvotes: 0
Views: 896
Reputation: 3250
Know more about Manage expiration of Azure Blob storage in Azure CDN
I have tried the below approach pointing the partion folder and creating an external table:
IF NOT EXISTS (SELECT * FROM sys.external_file_formats WHERE name = 'SynapseParquetFormat')
CREATE EXTERNAL FILE FORMAT [SynapseParquetFormat]
WITH ( FORMAT_TYPE = PARQUET)
GO
IF NOT EXISTS (SELECT * FROM sys.external_data_sources WHERE name = 'folder02_dileepsynpadls_dfs_core_windows_net')
CREATE EXTERNAL DATA SOURCE [folder02_dileepsynpadls_dfs_core_windows_net]
WITH (
LOCATION = 'abfss://<CONTAINER>@<STORAGE ACCOUNT>.dfs.core.windows.net'
)
GO
CREATE EXTERNAL TABLE dbo.file_name_ext_table (
[age] bigint
)
WITH (
LOCATION = 'transformed-data/**',
DATA_SOURCE = [folder02_dileepsynpadls_dfs_core_windows_net],
FILE_FORMAT = [SynapseParquetFormat]
)
GO
SELECT TOP 100 * FROM dbo.file_name_ext_table
GO
Upvotes: 0