Business rules
Business rules are worth considering in isolation because they are a frequent cause of frustration when implementing business intelligence. The question that we must consider is, when and where should we define business rules in the business intelligence solution?
The answer is dependent on the stability of the business rule (whether it changes frequently or is reasonable static) and how widely accepted the business rule is within the organisation as a whole.
The mythical one version of the truth
"When I use a word,' Humpty Dumpty said in rather a scornful tone, 'it means just what I choose it to mean — neither more nor less." Through the Looking-Glass, and What Alice Found There – Lewis Carroll
One of the greatest myths pervading business intelligence is that there is one version of the truth within an organisation, just waiting to be found. Ask any senior manager whether there should be one version of the truth and they will immediately agree, citing numerous frustrations caused by the lack of uniformity. Many a time in my days as an analyst did I wish for one version of the truth when I produced a report and was told it was incorrect because the revenue figure in the report did not agree with the revenue figure in another report. However, when I looked into the discrepancies between two apparently similar reports there were good reasons for the differences. The reports had different audiences with a different objective.
One possible solution is to create a standard set of names for measures and other commonly used attributes. So sales might breakdown into: financial accounts sales; management accounts sales; sales for the purposes of assessing sales rep performance (excluding rebates and indirect commercial discounts); sales reported to industry survey provider (according to their rules at the time); sales for the purposes of billing a customer on a long term agreement; sales for the purpose of billing a customer with a volume rebate.
An organisation may have hundreds of these measures and attributes, each having many derivatives with slightly different nuances. Just compiling and agreeing on a shortlist is a stressful exercise. Maintaining the list is difficult and using them in practice is more complex still. For the sake of report presentation, it is impractical to augment every label in a report or dashboard with a full and unequivocal name.
Perhaps the biggest issue is that the one version of the truth can shift over time due to regulatory changes, new industry practices, or management personnel changes. If you are storing historical data in your data warehouse, do you report historical figures according to the old truth or new truth?
When a person speaks of one version of the truth, the best we can normally say is that it represents the truth as they see it at the current point in time. Striving for one version of the truth, even if it is a point-in-time version, is a worthy objective. However, we should not place too much reliance on the stability of the truth when we implement it as business rules in our data strategy.
Applying business rules
The previous section sets the scene for a pragmatic approach to applying business rules in the data strategy.
BI in practice
Applying business rules too early
A company requires a daily snapshot of the number of employees in the organisation. Everyone agrees that an employee is someone who works part time or full time, but contractors, and those on maternity leave should not be included in the count.
The payroll system has a flag to differentiate the employment types. If an employee’s status changes from contractor to part time, the system overwrites the current value with the new one. The daily extract provides a snapshot of the number of employees with the relevant flag value for each day.
For a couple of years the company extracts a count of all the employees that are full time or part time on that day. A change in government regulation means the company must now treat those on maternity leave as employees for reporting purposes so they change the daily extract logic to include them.
The history of daily count figures now represents two different calculated values. We cannot go back and recover a history of employees who were on maternity leave because the system only records a point in time status value. Equally, historical trends are difficult to interpret because an awareness of the change in business rules is required.
In this scenario, the problem occurs because the data is summarised and filtered before it reaches the DW. This is not good practice and an experienced practitioner would probably avoid it.
Nonetheless, we could have a similar situation where we extracted all the employee records, but then applied a business rule to the employee status to update all employees that are not full time or part time to ‘Non-employees’. In this instance, we have done the right thing by extracting the detailed data, but still lost the history of the employee category by applying a categorisation business rule to the column and not preserving the original data.
The Figure below shows a simplified version of the generic BI solution architecture (see Business Intelligence Solution Architecture, for more detail) with guidance on where to implement business rules.
There is an obvious trade off in the placement of business rules. The closer to the source system the rule is enforced, (towards the bottom of the architecture diagram above) the greater the chance of consistency across your reporting and analysis. On the other hand, if the business rule changes or becomes inapplicable to some parts of your reporting and analysis, the cost of changing the rule is far greater if it resides in a layer with a number of dependent layers above it. As a rule, it is far cheaper and easier to adjust calculations or groupings enforced in a couple of reports or an analytical database, than it is to make the same changes in the ETL or DW because of the chain of dependencies.
At the DW layer, my preference is to avoid filters, groupings, and categorisations that overwrite source system classifications. It should be possible to query the DW and reconcile the results back to the source system without having to perform acrobatics with the data. This is a very important part of user acceptance especially in the early stages of a BI project. If users do not trust the data, they will continue to use their own methods to extract it.
Analytical databases are ideal for applying filters, custom groupings, calculations, and algorithms. They have a more expressive query language than standard SQL, and because they are not the permanent historical data store, there is less impact if you have to rebuild them. You should probably expect to build a number of analytical databases for different objectives; each one can apply department or business unit specific logic.
Business logic in reports
Some practitioners have a strong objection to putting business logic in the reporting layer citing inconsistency and maintenance issues if the calculation logic changes. There is truth here, but I think it is over emphasised for various reasons. Firstly, it assumes that there will not be a maintenance issue in the dependent reports if the business rule resides in the database, which is not the case (see BI in practice example, below). Secondly, most reports have a far shorter useful lifespan than their actual lifespan. Recipients often cancel or significantly modify reports when they require maintenance. Finally, the argument assumes that the cost of putting all business logic in a database – be it relational or analytical – is significantly lower. Most reports belong to a small number of users who should know and be able to validate the business logic. If we were to squeeze all the business logic in all the reports of any organisation into a single database, the result would be thousands of lines of code that unless fastidiously maintained, would soon become impenetrable. When reports lack traceability users tend to go back to the source for their enquiries.
BI in practice
Calculation logic implicitly embedded in a report format
An analytical database for a department store defines the following calculation:
Revenue = Sales + Rental Income (from subleasing shop space)
Here is the format for one of their reports that uses the calculated revenue figure and then reports rebates and other discounts as separate items:
Revenue: | 4000 |
Less | |
Rebate: | 1000 |
Discounts: | 500 |
Revenue less discounts: | 2500 |
They decide to modify the calculation of revenue to include rebates.
Revenue = Sales + Rental Income – Rebates
They adjust the calculation in the database. They also adjust the dependent calculation: Revenue less discounts.
However, the report format has not changed and still anticipates rebates as a separate line item. The report looks inconsistent now that rebates are part of revenue.
Revenue: | 3000 |
Less | |
Rebate: | 1000 |
Discounts: | 500 |
Revenue less discounts: | 2500 |
Although the calculations for each individual value in the report are correct, they still need to modify the report to make the report layout consistent with the data.
The modified version of the report format (below) removes the rebates as a separate line item because it is now included as part of the revenue calculation.
Revenue: | 3000 |
Less | |
Discounts: | 500 |
Revenue less discounts: | 2500 |