Dave
Dave

Reputation: 474

How to read xlsx blob into pandas from Azure function in python

I am reading in .xslx data from a blob in an azure function. My code looks something like this:

def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):

    # Load in the tech and crm data
    crm_data = pd.read_excel(crmdatablob.read().decode('ISO-8859-1'))
    tech_data = pd.read_excel(techdatablob.read().decode('ISO-8859-1'))
   

The issue is when I try to decode the files, I get the following error:

ValueError: Protocol not known: PK...

And a lot of strange characters after the "...". Any thoughts on how to properly read in these files?

Upvotes: 2

Views: 1561

Answers (1)

Frank Borzage
Frank Borzage

Reputation: 6796

Please refer to my code, it seems that you don't need to add decode('ISO-8859-1'):

import logging
import pandas as pd
import azure.functions as func


def main(techdatablob: func.InputStream, crmdatablob: func.InputStream, outputblob: func.Out[func.InputStream]):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {techdatablob.name}\n"
                 f"Blob Size: {techdatablob.length} bytes")

    # Load in the tech and crm data
    crm_data = pd.read_excel(crmdatablob.read())
    logging.info(f"{crm_data}")
    tech_data = pd.read_excel(techdatablob.read())
    logging.info(f"{tech_data}")

Note: Your function.json should look like this. Otherwise, an error will occur.

{
      "name": "techdatablob",
      "type": "blobTrigger",
      "direction": "in",
      "path": "path1/{name}",
      "connection": "example"
    },
    {
      "name": "crmdatablob",
      "dataType": "binary",
      "type": "blob",
      "direction": "in",
      "path": "path2/data.xlsx",
      "connection": "example"
    },
    {
      "name": "outputblob",
      "type": "blob",
      "direction": "out",
      "path": "path3/out.xlsx",
      "connection": "example"
    }

The difference between this and your function.json is that you are missing a dataType attribute.

enter image description here

My test result is like this, there are seems to be no problems.

enter image description here

Upvotes: 4

Related Questions