A few days ago, I wrote a response to Mark Proctor’s post, where he talked about processes that used to take 30 minutes now taking just 30 seconds. While this is an impressive leap in efficiency, I couldn’t help but ask: why stop at 30 seconds? Why not push further–to 30 milliseconds? Let me explain.

The Power of 30 Milliseconds.

In response to Mark’s post, I wanted to explore this idea further by asking for a demonstration of reducing process times to mere milliseconds. Nobody offered one, so I decided to demonstrate it myself, using an example that had made its rounds in Excel communities a few years ago. The example comes from a post on LinkedIn by Wyn Hopkins, where he showcased how Power Query could be used to import 10 million records into Excel and produce a report. At the time, it was remarkable that such massive datasets could be handled within Excel through Power Query. However, this raised an important point–why does refreshing these queries take 45 seconds when we could make it instant?

The crux of the matter is that, as impressive as Power Query is, it’s not always necessary. The refresh times people complain about in Power Query are largely due to the nature of how Power Query operates. When you append new data, whether it’s a single record or a million, Power Query reprocesses the entire dataset, taking the same amount of time–usually 30 to 45 seconds depending on the dataset size. This brings us to the question: is Power Query the best tool for all scenarios?

A New Approach: Reporting Without Power Query.

In most reporting scenarios, Power Query is an unnecessary detour. A typical report usually aggregates, filters, or summarizes data, often without requiring the full power of an ETL – Extract, Transform, Load – process. Power Query, at its core, is an ETL tool, and while that’s invaluable in many situations, it’s overkill in many reporting scenarios.

To prove this, I took Wyn Hopkins’ dataset of 10 million records, added a million more, and built a report that operates without Power Query. I refreshed the report, and it worked instantly–in milliseconds. Whether I was working with 10 million or 11 million records, the refresh time was the same: a blink of an eye.

This was because, unlike Power Query, my report didn’t need to reprocess the entire dataset. It simply aggregated the data already present in the database and updated the summary fields.

The Key Differences.

There are a few key points here that I want to highlight:

1. **No Power Query**: My report did not use Power Query at all. Power Query is great for complex transformations, but when it comes to simple reporting, it’s unnecessary.

2. **Milliseconds, Not Seconds**: My report refreshed in milliseconds, compared to the 30 to 45 seconds that Power Query would have taken.

3. **Data Expansion Had No Impact**: Adding an extra million records didn’t slow down the refresh. This highlights the efficiency of reporting directly from the data without needing to reprocess it.

Why the 45-Second Delay Happens.

The delay in Power Query is often due to the reprocessing of data every time you refresh the model. If you’re reporting on data that’s already structured and doesn’t need transformations, why go through Power Query? The answer given in the Excel community is usually that Power Query needs to refresh the entire model to produce a report. However, as I’ve demonstrated, this isn’t always necessary. Excel already has the built-in capability to work with data and report it instantly.

Misunderstanding of Power Query’s Role.

Many people use Power Query for reporting when they don’t actually need ETL functionality. Reporting is about filtering and aggregating data; ETL is about transforming data before loading it into your model. The two are not the same, yet there is a misconception that Power Query is required for both. This results in unnecessary delays and inefficient workflows. Power Query is fantastic when you need complex data transformations, but for straightforward reporting tasks, it’s like taking a detour when you don’t need to.

A Simple Analogy.

Imagine I live in London, and I have a weekly meeting in Brighton. Every week, I take a detour to Hastings to pick up a colleague who doesn’t have a car, and we use the journey to discuss the meeting. That detour is justified because it serves a purpose. But what happens when my colleague is on holiday? Should I still make that unnecessary detour to Hastings? Of course not. I can go directly to Brighton and save time.

The same logic applies here: if you don’t need Power Query’s ETL capabilities, why take the detour?

An Alternative: ADO and Excel’s Built-in Capabilities.

Excel has a feature called ADO – ActiveX Data Objects – that allows you to query and aggregate data without the need for Power Query. While ADO doesn’t have the user-friendly interface that Power Query boasts, it’s incredibly simple to use with basic VBA. You don’t need advanced programming knowledge to use it effectively, and the time savings speak for themselves.

Conclusion.

The point I’m making is that using Power Query for tasks where it isn’t necessary is like taking the long route when you could take a direct path. If the delay of 30 or 45 seconds doesn’t bother you, by all means, continue using Power Query. But if you value efficiency and speed, and the task is simply reporting, there are better, faster ways to achieve the same result in Excel in milliseconds. In many cases, it’s not about choosing the hard way or the easy way, but rather, making an informed choice. If you don’t need the detour of Power Query, why take it?

By understanding the strengths and limitations of the tools at our disposal, we can streamline processes and improve performance, going beyond the 30-second refresh time to achieve true efficiency.

You’ve been listening to a podcast by Hiran de Silva. Read by Bill

Hiran de Silva

View all posts

Add comment

Your email address will not be published. Required fields are marked *