Alec
Alec

Reputation: 21

MS Fabric - Notebook data load from a lakehouse that's NOT default?

I've loaded sources and medallion stages into their own lakehouses in their own workspaces for scalability and trying to keep things tidy from the get-go. I'm now in a position where it would be REALLY useful to, within a notebook, be able to pull data in from a source, run some SQL on it, and load it into a layer. Pretty standard simple transformation step.

I can add those lakehouses to my notebook so they're both referenced there, but I can't get data from any lakehouse that isnt the default for that notebook. If we can't get data from it, why can we add it? Is this somehting that's coming soon or am I doing something wrong?

When I select 'load data' from a table in a lakehouse that isnt default to the notebook, it auto populates the cell, starting with the comment:

# With Spark SQL, Please run the query onto the lakehouse which is from the same workspace as the current default lakehouse.

df = spark.sql("SELECT * FROM SilverLayerLakehouse.myTable LIMIT 10")
display(df)

and trying to run that cell yields the error:

AnalysisException: Table or view not found: SilverLayerLakehouse.myTable; line 1 pos 14; 'GlobalLimit 10 +- 'LocalLimit 10 +- 'Project [*] +- 'UnresolvedRelation [SilverLayerLakehouse, myTable], [], false

If I switch it to default it runs fine, but I need to load from source and store in silver with some SQL in between. With 'One lake' I assumed my data was visible from anywhere and that was the whole idea behind it, but I can't seem to get my notebook to read anything from a lake in a different workspace? Am I missing something, or is this going to be functionality further down the line?

Upvotes: 2

Views: 5517

Answers (1)

David Browne - Microsoft
David Browne - Microsoft

Reputation: 89361

The preferred way is to create Shortcuts in your local Lakehouse to the tables in your remote Lakehouse or Warehouse.

Alternatively you can use an absolute path to reference tables in another workspace. Use the WorkspaceID and Lakehouse ID in the URI, like this

df = spark.read.format("Delta").load("abfss://<Workspace GUID>@onelake.dfs.fabric.microsoft.com/<Lakehouse GUID>/Tables/<Table Name>")

See https://learn.microsoft.com/en-us/fabric/onelake/onelake-access-api

Upvotes: 1

Related Questions