Martinsos
Martinsos

Reputation: 1693

How can I parse Azure Blob URI in nodejs/javascript?

I need to parse Azure Blob URI in nodejs and extract storage account name, container name and blob name.

I investigated both azure-sdk-for-node and azure-storage-node but I found no method for doing so.

In case Blob URI is invalid, I would also like to detect that, so probably regex (if possible) would be a good way to go.

Some examples of Blob URI:

  1. https://myaccount.blob.core.windows.net/mycontainer/myblob
  2. http://myaccount.blob.core.windows.net/myblob
  3. https://myaccount.blob.core.windows.net/$root/myblob

Upvotes: 1

Views: 1559

Answers (2)

Thomas
Thomas

Reputation: 29491

You can use url.parse.

Main reason for me is to avoid using regular expression and it is also easier to understand, read and modify.

Here is a sample code:

const url = require('url')

const parseAzureBlobUri = (blobUrl) => {
    let uri = url.parse(blobUrl)

    // Extract the storage account name
    let storageAccountName = uri.hostname.split('.')[0]        

    // Remove the 1st trailing slash then extract segments
    let segments = uri.pathname.substring(1).split('/')

    // If only one segment, this is the blob name
    if(segments.length === 1){
        return {
            storageAccountName,
            containerName: '$root',
            blobName: segments[0]
        }
    }

    // get the container name
    let containerName = segments[0]

    // Remove the containername from the segments
    segments.shift()

    return {
        storageAccountName,
        containerName,
        blobName: segments.join('/')
    }
}

Upvotes: 4

Martinsos
Martinsos

Reputation: 1693

By following the specification from Azure, I came up with following function (gist) that uses regex to parse the blob uri and it also throws an error if blob uri is invalid.

Storage account name and container name should be completely right/precise, only blob name I left somewhat loose since it is more complex to define.

/**
 * Validates and parses given blob uri and returns storage account, 
 * container and blob names.
 * @param {string} blobUri - Valid Azure storage blob uri.
 * @returns {Object} With following properties:
 *   - {string} storageAccountName
 *   - {string} containerName
 *   - {string} blobName
 * @throws {Error} If blobUri is not valid blob uri.
 */
const parseAzureBlobUri = (blobUri) => {
  const ERROR_MSG_GENERIC = 'Invalid blob uri.'

  const storageAccountRegex = new RegExp('[a-z0-9]{3,24}')
  const containerRegex = new RegExp('[a-z0-9](?!.*--)[a-z0-9-]{1,61}[a-z0-9]')
  const blobRegex = new RegExp('.{1,1024}')  // TODO: Consider making this one more precise.
  const blobUriRegex = new RegExp(
    `^http[s]?:\/\/(${ storageAccountRegex.source })\.blob.core.windows.net\/`
    + `(?:(\$root|(?:${ containerRegex.source }))\/)?(${ blobRegex.source })$`
  )
  const match = blobUriRegex.exec(blobUri)
  if (!match) throw Error(ERROR_MSG_GENERIC)

  return {
    storageAccountName: match[1],
    // If not specified, then it is implicitly root container with name $root.
    containerName: match[2] || '$root',
    blobName: match[3]
  }
}

Upvotes: 0

Related Questions