The Microsoft Power BI platform provides a variety of architectural options for analytics solutions. These options can be “mixed and matched” as required to satisfy a wide variety of requirements. This flexibility is an obvious strength of the platform, but how does one determine which architecture configuration will best fit his or her needs?
In this blog I’ll describe three variations of Power BI architecture using real world scenarios to illustrate how and why they work. I’ll limit the scope of this blog to architecture considerations and avoid delving into implementation details. However, I will provide links to additional resources for a deeper dive into related topics.
Why is this topic important?
Architecture design decisions are foundational and there’s a good chance you will live with them for a long while. These decisions will directly affect many other design and implementation details that will follow. Therefore it’s important to understand the implications and trade-offs of your design choices before you make them.
First, a few definitional terms and concepts relevant to our subject matter.
Data Source – The source of any and all data that are used to populate data models and datasets.
Data Model – The structure used by an analytics engine (e.g. Power BI Service, SSAS, Desktop and Excel) to store data and render query results. The model defines tables, relationships and calculations. When published to the Power BI Service, data models become datasets.
Dataset – Data that are exposed in the Power BI Service as a result of publishing a solution, or from a Get Data process. Datasets provide the data that are used to render visualizations, reports and dashboards.
Resident Dataset – Data that are physically stored in the Power BI Service (PowerBI.com).
Non-Resident Dataset – Data remains on premise or in some form of cloud-hosted service such as Azure SQL Database, Azure SQL Data Warehouse or Virtual Machines.
What are the top three factors for determining an appropriate architecture configuration for Power BI?
- Location of your source data (e.g., On premise, Cloud, Third-party SaaS, Files, Web, OData).
- The method you choose for integrating source data: Import, Live Connection or Direct Query.
- Desired location of the data in your Datasets: Either it lives in Power BI (Resident) or it remains in the source systems (Non-Resident).
Note that there are interdependencies between each of these factors. For example, the location and type of your data sources dictate which methods for data integration are available, and methods you choose for data integration dictate the location of the data in your published Datasets.
With regard to item three, implementation scenarios in Power BI take one of two forms: Residentand Non-Resident. With either variation, your dashboards, reports and visualizations live within the Power BI Service but there is a difference in where the data in your datasets lives when your solution is published. Resident or Non-Resident is a function of how you connect to your data sources (Import, Direct Query or Live Connection).
For more information on data integration modes check out my related blog:
In the scenarios featured below, I’ll showcase examples of both Resident and Non-Resident architecture configurations. I’ve chosen three examples of recent work we have done for clients that represent probably eighty to ninety percent of the implementations people are doing these days in Power BI. I’m choosing not to provide examples of Embedded, Streaming or Rest API implementations. Watch this space for a follow-up blog on these topics.
The examples shown include a typical Resident (Import) architecture model, and two variations of Non-Resident architecture (Live Connection and Direct Query).
Scenario 1 – Resident Data Using Import
A national media publishing firm with operations in six major metropolitan markets needs to share operational dashboards and reports throughout the company. The company has minimal IT Staff and no desire to build additional on premise infrastructure. Power BI provides dashboards and reports for management and staff, organized by division.
Design Factors, Requirements and Constraints
- Operational data lives in multiple cloud-hosted SaaS (SQL Server) databases
- Nightly consolidation of SaaS databases is required for a comprehensive view of the operational data
- No existing data warehouse and no immediate plan (or budget) to build one
- Internal IT support and maintenance required for the solution must be minimal
Case for Resident Data Option
- Data must be “mashed up” from multiple data sources (not possible with Direct Query or Live Connection)
- No need or justification for implementing on premise SSAS (e.g., Row Level Security not required)
- Building ETL and the dimensional model in PBI Desktop is completely feasible
- 24-hour latency of the data (nightly refresh) is acceptable
- The size of the dataset is less that 1GB (current size limitation for resident data in the Power BI Service)
- No issues preventing the use of the On Premise Gateway for enabling scheduled refresh
- Provisioned and deployed an Azure Hosted VM with SQL Server for staging operational data. Very basic SSIS package implemented to download database backups from the SaaS provider nightly
- Used Desktop/Power Query to design, build and populate a dimensional data model from staged data
- Used DAX to implement metrics and complex calculations
- Created multiple report books with multiple report tabs created, targeting specific divisions
- Created Group Workspaces via O365 Groups for each division
- Consolidated Exec dashboards for 360 view of the operations data
Scenario 2 – Non-Resident Data Using SSAS and Live Connect
A Large non-profit health care organization delivers a range of clinical services to underserved communities. Analytics is essential for monitoring results and optimizing outcomes. An on premise data warehouse provides access to curated data and is the primary source of data for analytics. Power BI provides dashboards, reports and R analytics to people throughout the organization. Analytics are tailored to address user requirements based on functional areas and job roles.
Design Factors, Requirements and Constraints
- Expose data for dashboards and reports in the cloud from the on premise enterprise data warehouse (EDW)
- To avoid potential HIPAA issues, keep the data on premise
- For security and compliance reasons, implement row-level security based on AD security groups
- Integrate predictive analytical capabilities through R
Case for Non-Resident Live Connection Data Option
- Robust EDW exists with complex ETL processing handled
- Data must remain on premise, cloud-resident data is not an option
- Row-level security required
- Power BI Reports and R visualizations are published to Power BI Service
- SSAS (Tabular) replicates the dimensional data model and data from the EDW
- SSAS implements security roles for data access and HIPAA compliance
- On Premise Gateway (aka Enterprise Gateway) provides Live Connection from Power BI Service to the on premise SSAS instance.
Scenario 3 – Non-Resident Data Using Direct Query and Azure SQL Database
A large oil company needs web-based analytics for field operations management staff. Source data from the field operations are captured in Excel which must then be transformed and consolidated in a database in order to enable analytics.
Design Factors, Requirements and Constraints
- Data for analytics must be cloud-hosted because the field operations does not maintain sufficient IT infrastructure for housing the database on premise
- Low latency data requirements (hourly)
- Excel is the data capture tool
Case for Non-Resident Direct Query Data Option
- Cloud database platform required for storing operational data
- Complex DAX is not required to fulfill analytics requirements
- Data model is not dimensional; the data is stored in normalized single table format allowing for simple queries (and good performance)
- While the Import method is an option, Direct Query is simpler and will suffice
- Custom ETL process created to upload Excel data to Azure on an hourly basis
- Azure SQL Database used to house the operational data
- Reports authored in Power BI Desktop but this was a choice, not a requirement. Because there is no requirement for DAX, or for defining relationships in a localized data model, the reports could have been created in the Power BI Service.
Selecting the most appropriate architecture configuration for your Power BI solution is partly a matter of choice and partly a matter of circumstance. Here are ten questions you will need to consider as you contemplate your options:
- What type of data sources will you be integrating?
- Where does the source data reside?
- What is your preferred location for the data in your datasets – Resident or Non-Resident?
- Do you want to minimize on premise infrastructure?
- Will you be combining data from multiple data sources?
- How current must your data be?
- Do you have a mature data warehouse?
- Do you need row-level security?
- Will your users have Pro licensing?
- Is OneDrive an option for file storage?