Understand the performance of a Semantic Model refresh -- twice as fast with StreamBeforeRequestCompletes

In one of my projects we have been migrating a Cube from an on Prem SQL Server Analysis Services Instance (SSAS) to Power BI Premium/Fabric Semantic Models. With XMLA endpoints migration is a breeze and one can easily use Tabular Editor to deploy the same model to onPrem and the Cloud.

In the relevant SSAS model we have a dozen fact tables that each contain several million to about 150 million rows. The data source is an onPrem SQL Server based Data Warehouse. However, in the case at hand, processing time of the cube doubled with the mgiration.

This post aims to explain how to investigate the details and present the solution that worked in this case.

Analyzing the Query Performance

Power BI is integrating with Log Analytics workspaces. One can connect a Power BI workspace hosted on a Premium Capacity to an Azure Log Analytics Workspace. Details of this can be found in the documentation (see learn.microsoft.com).

Log Analytics Queries can be queried by the Kusto query language. All logging for Semantic Models happens in PowerBIDatasetsWorkspace. More details of that table can be found at learn.microsoft.com.

Within the Log Analytics table there are three relevant operations:

  • ExecuteSQL: executing the query on the source system
  • ReadData: ingesting the data into the semantic model
  • VertiPaq: compressing a segment in the VertiPaq engine

To create meaningful analysis, I have used the following KQL query (similar efforts for slightly different use cases have been done by Chris Webb (see [1], [2]) and Phil Searmark (see [3]). In addition to previous work my query allows you to explicitly see specific partitions and segments of each table.

PowerBIDatasetsWorkspace 
| where OperationDetailName in ('ReadData', 'VertiPaq', 'ExecuteSql')
| where TimeGenerated > ago(24h)
| extend tableSql = extract(@"[a-f0-9\-]+.Model.(\w+).\w+", 1, XmlaObjectPath)
| extend partitionSql = extract(@"[a-f0-9\-]+.Model.\w+.([\w ]+)", 1, XmlaObjectPath)
| extend tableRefresh = extract(@"<ccon>(\w+)</ccon>' table", 1, EventText)
| extend partitionRefresh = extract(@"<ccon>([\w ]+)</ccon>' partition", 1, EventText)
| extend table = coalesce(tableSql, tableRefresh)
| extend SSASpartition = coalesce(partitionSql, partitionRefresh)
| extend segment = extract(@"for the segment [']{0,1}([0-9]+)[']{0,1} of the", 1, EventText)
| extend GermanTime = datetime_utc_to_local(TimeGenerated, 'Europe/Berlin')
| summarize startTime=min(GermanTime), endTime=max(GermanTime) by table, segment, XmlaSessionId, OperationDetailName, PowerBIWorkspaceName, SSASpartition

By using a custom Power BI timeline visual one can produce an initial timseries such as the following:

Status Quo: timeline of semantic model refresh

In above graphic one can see that half of the time is spent purely running the query on the data. Only after this the data is processed and compressed into the semantic model. The overall load time is about 60 minutes.

Solving the issue

After some research I have found the StreamBeforeRequestCompletes setting (see learn.microsoft.com). This settings causes the on-premise data gateway to stream the result set already before completing the query.

After restarting the service the performance chart looks very different:

Optimized Gateway configuration: timeline of semantic model refresh

It has reduced the load time by 50%.

Summary

In case you are loading a significant amout of data volume to Power BI/Fabric with the on-premise data gateway, consider the StreamBeforeRequestCompletes setting . Moreover, in case of any performance issue Log Analytics offers great insights and hopefully you can solve your issue as well.

Previous Post