| |

How to handle a large dataset in Power BI Desktop?

A few years back when I was working on my first Power BI project for a large US life insurance client. I was working on a dataset with 40 million + rows and then faced issue in loading data into the Power BI Desktop.

Due to large data in the Power BI Desktop, .pbix file used to consume lot of memory and my laptop used to get hanged or some times .pbix file used to get crashed.

In this article, We will see all the options through which we can handle large datasets/ big data in Power BI Desktop.

Some of the ways using which you can handle large dataset are mentioned below.

  1. Using Import Mode with PPU/Premium License.
  2. Using Live Connection.

Before going deep into this topic we need to understand all the connection modes available in Power BI briefly.

  1. Import Mode: Using Import mode, data is compressed, optimized, and stored in the in-memory on the disk. Once in memory, Import models can then achieve very fast query results. It’s also important to understand that there’s no concept of an Import model being partially loaded into memory.

    When refreshed, data is compressed and optimized and then stored on a disk by the VertiPaq storage engine. When loaded from disk into memory, it’s possible to see 10x compression. So, it’s reasonable to expect that 10 GB of source data can compress to about 1 GB in size. Storage size on disk can achieve a 20% reduction from the compressed size.
  2. Direct Query: Using Direct Query, all the processing takes place on the database. When the users interact with the report then a series of queries are fired back to the database and results are later fetched and shown in the report.

    Data Model does exist in the Power BI but there is some restriction on the use of transformation in Power Query and some of the Dax formulas do not work in the direct query mode. For a large dataset, it is not advisable to use a direct query as the response is very slow.
  3. Live Connection: Some of the tools such as SQL Server Analysis Server (SSAS) and Azure Analysis Server (AAS) provide features to build a tabular data model in it and then it can be used to live to connect with Power BI to display visuals.

Using Import Mode with PPU/Premium License

If you are planning to use import mode which is also one of the most preferred ways of connection as there is a compression and optimization done in the Vertipaq engine making report to get the queried data faster then you get below mentioned single dataset limit on each of the license. You will require to select the right license based on the size of the datamodel and other factors such as number of consumers of the reports, etc.

Pro: 1 GB
PPU: 100 GB
Premium: 400 GB

Now, we will see how to restrict the data in Power BI Desktop so that .pbix file does not hang the computer/laptop and then load the complete dataset in the Power BI Service.

Step 1: Create a Parameter

You need to create a parameter in the Power Query to determine the number of rows you want to bring into the datamodel.

Suppose you have a large dataset having a millions of rows then you can bring a sample of data so that you can check whether transformation applied and Dax formulas written are working correctly or not. So, the appropriate number of rows must be mentioned in the parameter.

One of the important point to make sure is to keep the parameter data type as decimal.

Create Parameter

Step 2: Restrict Rows

In the main Fact dataset, click on the Keep Rows option >> Keep Top Rows to restrict the number of rows in the Power Query.

Restrict Data

Step 3: Restrict Rows using Parameter

In order to enable restriction of rows using Parameter in the Power Query, you will need to make changes in the Advance Editor and change the M Query Code of the Fact Table.

Instead of the Number which you have put in the Keep Top Rows option, place the parameter name so that Power Query can take value of number of rows to keep from the parameter.

Also, write a if else statement stating if the number of rows is greater than 0 then keep the number of rows mentioned in the parameter else load all the data available.

Advance Editor Changes – Place Parameter name in place of Number of Rows to keep inorder to control the number of rows through parameter.

Step 5: Power BI Service Settings – Change the Parameter Value in the Power BI Service

After the making the advance editor changes, click on Apply and Close option to load the restricted data in the data model. Then you can make the visualization based on the sample data loaded and publish the report in Power BI Service.

Go the Workspace where you have published the report and click on Datasets +dataflows options and go to the dataset settings.

Parameter and Datasource Credential Required in Power BI Service

In the Dataset Settings, You will need to fill the parameter value, if it is 0 then all the data will be refreshed and if there is number greater than 0 then rows equal to that number will load. You will also require to fill the credential information by clicking on the Edit Credential option.

If you want to do schedule refresh then you can click on Schedule Refresh option available in the Power BI Service or else you can go back to the dataset and click on manual refresh option available against the dataset name.

Now You will see if you place 0 in the parameter value in the Service then all the data will get loaded in the report. So in this way, you can load a large dataset in the Power BI Report without loading the entire dataset in the .pbix file.

Using Live Connection

Using tools such as Azure Analysis Service (AAS) or SQL Server Analysis Service (SSAS), you can build the tabular data model in these tools and use Power BI to just show the visuals using Live Connections.

If you are using Live Connections then you will not be able to do any transformation in the Power Query or write any DAX measure. All the calculations and transformation will be done in the AAS or SSAS.

Here, apart from the PRO license required to publish the report in the Power BI Service Workspace, there is an additional cost involved for the AAS server calculated on Pay per use model.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *