We’ve posted some thoughts below on new product releases, the benefits of product upgrades and added some development tips we find useful……. Let us know if there’s a topic you’d like us to cover on firstname.lastname@example.org
How to get your head round Dimensional Reporting
Thursday, 21 September 2017
In my professional life as a BI consultant I have met many individuals who find it hard to understand the concept of dimensional data. If you’re reading this then the title has caught your attention and you too are looking for some friendly, non-techie advice on how to get your head round this intriguing subject.
So, where to begin. Let’s start with a single piece of data. Somebody has sold you a new car. Somebody somewhere will be interested in when it was sold, what colour it was, how much it was sold for, and what brand it was. From this you should be able to see two distinct types of information. The first is called a measure, and it is invariably a number, something that can be summed, averaged, or counted. The second is referred to as a dimension, and can be thought of a something that gives context to the measure. In our case the measures are the sale price, that 1 car was sold, and the value of the discount applied. The dimensions would be the name of the salesman, the day it was sold, the brand of the car and the site from which it was sold, to name a few.
Thus, a measure is one single item of data, and it is associated with any number of dimensions that give it context. A hamburger is sold for £2.50 at the Reading restaurant on Tuesday evening by Michael. The measure would be Sales Value (£2.50), and the dimensions would be product (hamburger), Site (Reading), Date (Tuesday), and Server (Michael).
You should by now get the idea that a measure can have any number of dimensions. However, if we limit ourselves at this point to two, then we can envisage the data set into a familiar grid, the sort of things that Excel, pardon the pun, excels at. For instance, using the car showroom example, we might have columns indicating brand, with rows indicating salesman. The cells in the spreadsheet could be total sales value for each brand for each salesman, or the numbers sold. Each cell in the table is given context by which column it is in and which row.
This sort of information may be interesting for the branch manager, but what if you happen to be the managing director of the franchise. I’d bet you want to see how each garage in your network is performing, which means adding another dimension, site. Conceptually this is like adding a worksheet to your workbook. You can begin to think of your data as a stack of worksheets forming a three dimensional structure. Each bit of info would be identified by the three dimensions of site, brand and salesman. But now what happens when you want to add another dimension? What if you need to see how sales have changed over time? This is where you have to pretend that a ‘cube’ of data can have more than 3 dimensions. Whilst not physically possible to visualise, it’s entirely plausible to have a cube with many more than 3 dimensions.
So what is the point of all this dimension guff?
In a word, it is consolidation, or the action of combining a number of things into a whole. Deep in the heart of our car sales ‘cube’ is the single sale of a car. This has come from a point-of-sales data source, and it might be one of thousands of data entries. Recorded with this sale will be the what, when, where, who and possibly even a why. With all this information to play with a user can now start to analyse the data using the dimensions. How many cars were sold in January by site? The cube will consolidate all the salesman and brand data to give an answer. Have sales of red cars improved over the last 6 months? The cube will roll up all the sites, brand and salesman data to find the answer. Which site was best at selling cars on a Saturday? I’m sure you get the idea. As long as you have the dimensions you can slice and dice the cube any way that gives you an insight into the data.
So at this point you may be asking ‘what is this cube thing you keep mentioning?’ As l suggested, a ‘cube’ is the manifestation of all the dimensional data you have collected. Different reporting tools create cubes in different formats, but the purpose is the same. The source data will have been aggregated across all the possible dimensions and placed in a storage structure that allows a rapid response to queries. Thus, instead of a query having to work through what could be millions of records to find only those that match criteria, then aggregate them up into a result, a cube will instead already have the result. It will have been calculated during its construction.
And having a cube to explore is useful because? Insight. Exploring dimensions can be an art form, and it helps if you know the data you are dealing with. To use our car sales example, say one month doesn’t look like it follows the general sales trend and we want to understand why. We can fix the time dimension for that month, and ‘drill’ through to other dimensions. By adding the site dimension, it might turn out that one of them hasn’t been performing. By adding a brand dimension we might discover that a particular brand hasn’t been selling well. The point is that one doesn’t have to use all the available dimensions at once to explore the data. A subset may reveal a trend, which can be explored by adding and removing other dimensions. A well designed cube will consolidate all of its data into the dimensions that have been identified by the target audience.
And whilst we are on the subject of dimensions, have I mentioned that they can include hierarchies? These add more depth to the dimensions. A hierarchy in this case can be thought of like a grandparent-parent-child relationship. A site can be in an area, an area can be in a region, and a region can be in a country. Things can get complicated if some sites are based in a region rather than an area (referred to as a missing level, or possibly a ragged hierarchy), or if one site appears in more than one area (multiple parents). This usually results in a lot of furrowed brows and some big bills. Try to avoid!
Exploring further, one additional concept you may hear regarding dimensional data is its granularity. This can be important because it can prevent measures being put together in the same dimensional model. For instance, a common example is sales data versus budget data. Sales data may be available by site, by person, & by day, but budget data may only be available by site and by day. This is because budgets are not defined with the same granularity as sales. The answer is to provide a cube model that has only the common dimensions pertaining to the two measures.
So there you have it. If you are familiar with columns and rows of information in a spreadsheet, then think of dimensional data as simply a stack of spreadsheets inside a stack of workbooks inside a stack of boxes.
Keep these points in mind and you won’t go far wrong. If in doubt, call us on 01635 889222 for more advice.