Power BI is an awesome tool. In fact, as I was writing this, I saw that again Microsoft is a Leader in the latest Forrester Wave, showing the strength of vision and strategy that Microsoft are showing with the product. There is a note of caution in the data, though, and that is something Microsoft should really pay attention to. Fingers crossed in the coming 12 months they will address the weakness in customer engagement.
For simplicity, we will just talk about PBIX files in this document. The lessons here do apply to the new Project file type, although there are subtle changes – changes that can only really be appreciated once you understand the true PBIX problem.
Power BI only has a single file type, however, that file type performs two (ok three) tasks – Data Storage and Data Visualisation (ETL processing is of course the third task). This means that a single Power BI file holding data and reports can be of a significant size. Looking at the work we have undertaken, the largest file we have is an early test model (it is not uncommon to have a huge file and then optimise as you focus towards delivery). That file is 4GB, with 99.99% of that file likely being just data. Power BI Pro limits you to a 1GB model size, so let us stick with that from now on. Meta data and visualisations, even for a large report, rarely exceed 5MB but let us be generous, so we can break that down:
We all know that it is vitally important to have files backed up for protection. So what do we need to backup from the above file? The simple answer is around 10MB worth. However, in order to back it up, we must backup the whole file. If we were able to track changes to the file, we would end up backing it up anytime it was refreshed; this is not a real concern as we should not be refreshing files regularly offline, but the backup “efficiency” is 1% as all the data should exist within another backup.
Attempts to manage this with Change Data Capture and tools like GitHub ultimately fail, because the amount of data makes cloud backup slow as well – meaning that it is extremely challenging to get the file into the cloud repository and synchronise it. A simple data refresh will also be considered as a version change in these contexts, further adding to the administrative load. The simple truth is that these tools are built to be able to cope with small files, and even the 10MB for visualisations, meta data and schema is technically pushing the limits. The synchronisation of 10MB, even on a slower connection in a modern office, should be completed in a matter of seconds, while 1000MB would take exponentially longer (100 times).
This is why Geordie Consulting recommend splitting the Visualisation loads from the Data (and ETL) loads. The major benefit of this is that the Data Load rapidly becomes a very stable area, where a more traditional “Enhancement” process can be used to manage the process of evolutions and updates. This is extremely suited to the deliberate nature of data model changes, especially as the model matures, meaning changes must be coordinated across the Centre of Excellence to ensure success. While it is possible to use the OneDrive synchronisation features, these are still in preview and for a larger model the synchronisation times may be significant.
This is why Geordie Consulting recommend splitting the Visualisation loads from the Data (and ETL) loads. The major benefit of this is that the Data Load rapidly becomes a very stable area, where a more traditional “Enhancement” process can be used to manage the process of evolutions and updates. This is extremely suited to the deliberate nature of data model changes, especially as the model matures, meaning changes must be coordinated across the Centre of Excellence to ensure success. While it is possible to use the OneDrive synchronisation features, these are still in preview and for a larger model the synchronisation times may be significant.
Visualisations are ultimately the real challenge for any organisation. These are where people “see” the data, and so where complaints will arise. Let us assume that we have managed to get a great Data Strategy in place. Data model changes are managed through an enhancement process, including change management for the deployment to production. So our visualisations should be super simple, right? The answer here is “Not really”. At a simple level, Visualisation only files (files that are connected to a published Power BI Semantic Model) are rarely more than 5MB in size, so they can be simply backed up and version controls applied. We could use SharePoint Version History at the simplest, or a more complex GitHub or other workflow to manage the versioning of our files and that will work. The workflow is simple:
Create File and connect to Semantic Model
Add pages and visualisations as required or requested by customer
Publish to service
Update App
Check-in file
All of that seems simple, however at steps 3 and 4 there is a problem. The Power BI service by definition works with a duplicate of the file, meaning it is no longer tied to the file. For those Power BI Developers in the audience, they will have experienced the “That’s great but…” moment of Power BI delivery when showing the new report you have built to the target audience and they have some minor changes. It may be as simple as “Can you change that Pie Chart to a Tree Map?” In that case it is not uncommon for people to dip into the file online and select edit, make the change and keep the audience happy. That leads to a difference between the file in SharePoint and the published file. Experienced developers will always take a copy of the file after the go-live presentation and ensure that it is uploaded to the repository, regardless of if they think they made a change or not. The challenge comes in six or more months when a report change is requested. Do you KNOW that no-one has “tweaked” the report and not updated the offline version? There is no readily available view to ascertain when a Visualisation Only file was published to the service (or updated). This means that our recommended best practice for this situation is always the same. Assume the online version is different and download and replace the repository version with the online content as step one of the update cycle.
Power BI Apps are even worse for this as the version that the app displays may be “behind” the workspace version, and there is no way to download the App version.
In our diagram we can see that the file has been updated in the Power BI Workspace, but that not all updates have made it to the App or to the “offline” repository (SharePoint or GitHub). Even more concerning, the App is locked at V3, a version that we have no backup of. When the upgrade is completed, we would expect that to be V6 and the cycle would begin again at the “Launch” stage – so potentially content that the audiences “like” or even required has been deleted or lost.
The shadow copy of visualisation content that exists within Apps is the elephant in the room as far as Apps are concerned. The challenge you have is that there is a logical reason for it, and that logic demands that the behaviour remains. The shadow copy functionality enables Apps to be prepared, and the content tested while the end users are still using the previous version. It then also ensures that their links continue to work when the app is updated. So we can see that this functionality is by design, and that our lack of discipline is ultimately behind the issue we face. However, that does not help when your customers are complaining that a visual they relied upon has disappeared and you have no record of that visual.
Group Management and controls of access to workspaces do help but they do not prevent people from making well-meant mistakes.
No experienced developer has not made the mistake of updating a file online only, but this does mean that if an App exists, you need to go page by page through the App and Workspace versions of the reports to make sure they are the same. It would be nice if it were possible to recover the shadow copy version but, to date, no tool exists (that we are aware of, so if you know one please share).
Understanding that there is no way to 100% manage the Power BI Workspace content and shadow content makes it much simpler for us to fully embrace the challenge of our Power BI file content. We can recognise that there need to be two streams of file management, the very structured and rigid controls of our Semantic Models (or Data PBIX files) and the far less structured approach that we end up with for our visualisations. Calling it a less structured way of working is also not a failing, it is an acknowledgement that we cannot fully control the environment so we must assume “taint” at all points of contact. Restrictions MUST be applied to who can place content into your App Workspaces, especially as your organisation utilisation of Power BI expands. As the workspace will contain content from multiple workstreams and models – meaning an inadvertent App update may take things live that should not be live as yet – those who place content in Workspaces that host apps must also ensure that the app is updated as soon as possible after the update of any file, again to minimise the risk that the Shadow copy is different from the workspace copy.
Critically, ALL visualisation updates must start by downloading the workspace copy and updating your “offline” repository with it.
Geordie Consulting has the expertise in Power BI best practices and can provide guidance on maintaining consistency between your Power BI files in SharePoint or GitHub, and the published files in the Power BI service. We offer comprehensive support and training services to empower your team to manage Power BI workspaces and apps effectively. Our training sessions cover best practices, troubleshooting, and the latest features and updates in Power BI. We also provide ongoing support to address any issues that may arise, ensuring that your organisation can continue to rely on Power BI for its data visualisation needs. Managing Power BI files and apps can be challenging, but with the right partner, it becomes much more manageable. Geordie Consulting offers the expertise, processes, and support needed to ensure that your Power BI environment remains consistent, accurate, and up-to-date. By working with us, you can focus on leveraging the full potential of Power BI to drive insights and make informed decisions, confident that your visualisations and reports are in good hands.