We’ve posted some thoughts below on new product releases, the benefits of product upgrades and added some development tips we find useful……. Let us know if there’s a topic you’d like us to cover on email@example.com
Preparing your data for reporting
Monday, 06 November 2017
Lots of clients ask for our advice on how best to prepare their data for reporting to make sure they can get the information they actually want. By first thinking through the questions that need answering data can be properly prepared to deliver speedier, accurate reports with no extraneous information. In this article I’ll explain some of the concepts involved, and suggest a few ‘good practices’ to take into account. I’ll also elaborate on some of the terminology that gets used, just in case it all sounds a bit like techno babble.
The first thing to highlight is that it’s never a good idea to use transaction data directly as a source for reporting. Transaction data is usually stored in an Online Transactional Processing system, or OLTP for short. This is designed for speedy inserts of data, and may have large numbers of concurrent connections. Compare this with a data source designed for reporting. Here the principle requirement is to be able to sift through all the pertinent records and pull out only those that meet a criteria. Often referred to as an Online Analytical Processing system (or OLAP for short), the design requirements are quite different from OLTP, and ideally a data store should be one type or the other.
The key characteristic of an OLAP record (the type we ideally want to use as a reporting source) is that it is made up of two types of element. The first is a measure. This is a number, something that can be summed, averaged or counted. Think sales value, discount applied, or covers sold, in a restaurant context. Or it could be river flow speed, biscuits baked or electrons on a detector. You get the idea. The second element is called a dimension, and it gives context to a measure, the what, why, when and where if you like. Think of the date that a sale was made, the place where it was sold, and the person who made the sale. Dimensions help us understand the measures we are looking at.
This leads on nicely to what really should be the most important part of preparing data for reporting – what questions do we want to ask? For instance, if we want to know which restaurant is doing best, we need to know the sales for each restaurant, which gives us a sales measure and a site dimension. If we want to know the trend for sales, we can add a calendar dimension, so that we can see how sales have changed over time. If we want to know which menu items sell best then we need to add a product dimension. We can add as many dimensions as we need to answer the questions being asked, but keep in mind the requirement that each measure used must have the same set of dimensions. Don’t try and combine something like menu item sales by site by day with energy consumption by site by day, not unless you have the energy consumption figures broken out by menu item! In a similar vein, don’t be tempted into reporting on two measures that have the same dimensions, but at different levels. By this I mean, don’t try reporting on Total Sales by day combined with budget sales by week. Either break down the budgets into daily figures, or rollup the sales into weekly totals. Thus, when we know the questions, we can think about the measures and dimensions. Don’t be put off when you realise you may well need more than one reporting model to answer all of the potential questions.
So, we have a source of data (OLTP), of which there may be more than one, and we know what our reporting data should look like, our OLAP system. The next step is to get the data from the source to the target. To a large degree what happens next will be determined by what reporting tool you anticipate using. The first step might be to put all the measures into a ‘fact’ table, and all the dimensions into their own ‘dimension’ tables, to create what is often referred to as a ‘Star’ schema dataset. Here, every row in the fact table is identified by ‘key’ fields that join to the dimension tables in a ‘one-to-many’ relationship. This bit of jargon simply means that one row in a dimension table can have links to many rows in the fact table, and that many rows in the fact table link to one, and only one, dimension row. In other words, the same shop can make many sales, but each sale can only occur at one shop. This dataset would be filled by grouping or consolidating all the OLTP records into the appropriate OLAP dimensions. Using restaurants sales again as an example, there may well be thousands of sales records that have detail such as time of day, till number, sales person, department, product key, sales invoice and receipt printed. Not all of this detail is required in the OLAP system, and thus they are all consolidated into perhaps a few hundred records identified only by day, site and department.
Data like this can now be used as a direct source for something like Microsoft’s Power BI or Reporting Services. It can also be used with Power Pivot with Excel. Note, “Other tools are available” but I am a fan of these Microsoft products. Going one step further this dimensional data can be used to feed Analysis Services cubes or tabular data models. And there is also the option of building IBM Cognos packages, but I’m going to save that for another blog.
So, there you have it. A simple plan to turn a flood of information into a little pot of reporting gold.
• Make sure you understand the questions that will need to be asked, and turn these into suitable measures and dimensions in an OLAP model
• Don’t try and use an OLTP system directly as a reporting data source
• Ensure each measure in each model has the same dimensions at the same level
• You can combine multiple sources into one OLAP model
Keep these points in mind and you won’t go far wrong. If in doubt, call us on 01635 889222 for more advice.