Lilly
Lilly

Reputation: 988

Extract table from SAP BW to Azure Data Lake Gen2 using data factory

I would like to know the procedure to extract table from SAP BW installed on Azure cloud to Azure data lake gen2. I want to use ADF to copy data from SAP BW to Data lake.

Can we connect ADF to SAP directly with SAP connector? Do I have to install Runtime Integration and any VM for this connection? What's the difference between SAP BW Open Hub connector and SAP BW via MDX?

Would like to hear from experts on how to extract data from SAP BW, when SAP is also hosted on Azure. Thanks.

Upvotes: 2

Views: 2459

Answers (1)

Vitali Dedkov
Vitali Dedkov

Reputation: 318

I am not an expert, but the difference was explained to me by a BW person where you can use both, but with OpenHub you can run an extract on a BW query without involvement of a BW person, but performance would not be great. With MDX I believe there is additional development that would need to be set up on BW but the performance is better.

Also keep in mind that when I was running those queries I found it hard to parallelize it and while Microsoft docs did not provide a good example I found that what whatever I pushed to BW it was sent as a single query.

Alternatively my recent use case was to get data out of a table in SAP BW vs a cube so this might work.

I followed instructions listed out for the "SAP Table" connector

For this process to work you will need a self hosted IR (either on your laptop or a VM that is attached to an ADF) and you will need to install the following drivers:

SAP Table connector requirements

To get those drivers you will probably need to reach out to your Basis team. They will also need to also create an Interface role (esp if this is your first time making this connection and you want a service account to be re-used by other processes).

After all of that you also need to have RFC authorizations added to this Interface. The below ones are the ones that worked for me. Microsoft website does give out a suggested RFC authorization, but those are almost on admin level and our Basis team basically did not want to do that:

S_RFC: FUGR - RFC1, SYST, SYSU FUNC - RFCPING, RFC_FUNCTION_SEARCH ACTVT – 16

In addition to above we had to run a couple of tests and found that depending on the number of tables you want to pull data from they might need to add additional authorizations so that you can only read from that table.

The above process was the one I followed so your's might look a little different, but to make this work you need: Self Hosted IR, SAP drivers installed on those IRs, Firewall rules allowing you to access the BW system id, Interface created by Basis, then also RFC authorizations.

I have opened up an issue on the microsoft github documentation about the incorrect RFC authorization list: https://github.com/MicrosoftDocs/azure-docs/issues/60637

Also keep in mind that the way that ADF pulls the data it first sends query to BW, BW then creates a file on its end collecting that info, the file then is sent back to the Self Hosted IR which then writes the data into a storage account through ADF. What might happen is that if the file is too large then the pipeline can fail, but not because of ADF, but because of limitations on BW side.

Hopefully my experience can help someone else stuck :)

Upvotes: 2

Related Questions