Reputation: 2101
Report authoring in Power BI is done in Power BI Desktop, which is installed on users' workstations. Report sharing in Power BI is done in the Power BI cloud service (either shared or dedicated capacity). This means that different resources (i.e., memory, CPU, disk) are available during report authoring and report sharing, particularly for data load (dataset refresh). So, it seems impossible to test a report's data load / ETL performance prior to releasing to production (i.e., publish to the cloud service). And, usually, data load performance is faster in the cloud service than in Desktop. Because my reports contain a lot of data and transformations, data loads in Desktop can take a long time. How can I make the resources available to Desktop identical to the resources in the cloud service, so that I can reduce data load times in Desktop (during development) and to predict performance in the cloud service?
Perhaps a better question to ask is, should I even be doing this? That is, should I be trying to predict (in Desktop) a report's refresh performance in the cloud service (and / or load production-level data volumes into Desktop during development)?
Upvotes: 1
Views: 1211
Reputation: 122
I've developed a tool at some point that uses the PowerBI-Tools-For-Capacities Microsoft under the hood.
Upvotes: 0
Reputation: 4967
Microsoft do not specify what hardware CPU/Memory is used in the Power BI Service. It is also a shared service, so more that one Power BI tenancy could be hosted on the same cluster. They do mention that you may suffer from noisy neighbour issues, so if some other tenancy is hitting it hard, your performance may suffer.
I know from experience that the memory available is greater than 25GB, as queries that have not run on Premium P1 nodes, have run ok in the service. With the dedicated nodes, you can use the admin reports to see what's going on in the background, query times, refresh time cpu/memory usage.
There are a few of issues trying to performance test Desktop vs Service. For example, a SQL query in desktop will run twice, first to check the structure and data, the second to get the data. This doesn't happen when it is deployed to the service so in that example your load will be quicker.
If you are accessing on-premise data then it will be quicker in the desktop, than the service as you'll have to go via a gateway. Also if you are connecting to an Azure SQL Database, then the connections and bandwidth between the Azure Services will be slightly quicker when you deploy it to the service, than a desktop connection to an Azure Service as the data has to travel outside the data centre to get to you.
So for importing datasets, you can look at the dataset refresh start and end times and work out how long it did take.
For a base line test, generate 1 millions rows of data, it doesn't have to be complex. Test the load time in desktop a few time to get an average, deploy and then try it in the service. Then keep adding 1 million rows to see if there is a liner relationship between the amount and time taken.
However it will not be a full like for like comparison depending on the type of data, the location and network speed, but it should give you a fair indication of any performance increase you may get when using the service to balance desktop spec to the service.
Upvotes: 1