Reputation: 197
I would like to create an API endpoint in Azure Databricks that would publish delta table data to a user. All I found by searching there is no REST API to expose delta table content. Is it true?
Upvotes: 4
Views: 6377
Reputation: 87279
You can just leverage the Databricks SQL Statement Execution API to access data on Azure Databricks. Follow the tutorial that shows how to work with that API using curl
Upvotes: 3
Reputation: 8160
In Databricks there is no REST Api to get data in delta table, but we can get data of delta table using Databricks jobs.
Follow below steps,
First, create a notebook in databricks which gets data in delta table and returns the data in Json format. Add below code in your notebook, alter accordingly to your table.
df = spark.sql("SELECT * FROM test")
j_son = df.toJSON()
res=""
for i in j_son.collect():
res = res + "," + i
dbutils.notebook.exit("["+res[1:]+"]")
Then create a job using this notebook.
In the next page create the task name, select the path to notebook and your cluster on which it needs to be run. Then hit on create.
After creating you will get details as below, copy the job id.
This is the job in databricks provides API's for running job, getting results of that particular job.
For more information follow this documentation.`
Next create an backed server where the API request raised to databricks job. I've used javascript and expressJs framework.
In visual code open new folder and execute below command in terminal. Make sure you have nodeJs
installed in your system.
npm install express axios cors
You will get the project structure as below in your folder.
Next, create .js file, mine it's btst.js
.
After creating, add below code in that file.
const express = require('express');
const axios = require('axios');
const cors = require('cors');
const app = express();
app.use(cors({
origin:'*'
}));
const port = 5000;
// Set the access token for authentication
const token = 'your databricks token';
// Set the base URL for the Databricks REST API
const baseUrl = 'https://your databricks host/api/2.1/';
app.get('/runjob', async function(req, res) {
try {
const headers = {
'Authorization': `Bearer ${token}`,
'Content-Type': 'application/json'
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
const requestBody = {
job_id: "1121949406005982"
};
const response = await axios.post(`${baseUrl}jobs/run-now/`, requestBody, { headers });
console.log(`run id : ${response.data.run_id}`)
console.log("job is running...")
const t_res = await axios.get(`${baseUrl}jobs/runs/get?run_id=${response.data.run_id}`,{ headers })
task_id = t_res.data.tasks[0].run_id
console.log(`task id : ${task_id}`)
await sleep(15000)
const d_res = await axios.get(`${baseUrl}jobs/runs/get-output?run_id=${task_id}`,{ headers })
j_res = JSON.parse(d_res.data.notebook_output.result)
console.log(j_res)
res.send(j_res);
} catch (error) {
console.log(error)
res.status(500).send(`Error1.......${error}`);
}
});
app.listen(port, () => {
console.log(`Server listening at http://localhost:${port}`);
});
Here you paste the job id you copied earlier in databricks.
Then execute below command in terminal.
node yourfilename.js
In my case it is
node btst.js
Then your server will be running at http://localhost:5000
.
You send GET
request to this url in browser or using postman at endpoint.
http://localhost:5000/runjob
, you will get the data in json format so you can use it anywhere.
This is the below result, it takes 15 seconds to display because after sending post request to jobs it needs to run the task, it takes about 10-15 seconds.
and data in databricks table is
Upvotes: 3