11: Data Analytics

11.1 Data Analytics and Managerial Accounting

11.1.1 What’s All This Noise About Data Analytics?

“Data analytics are the next big thing! They’re big! They’re a thing!”

That’s the opening line of about one quadrillion think pieces about data analytics (or business analytics or big data or maybe a half-dozen other related ideas). Then when you start to read the article, you’re drowned in a lake of jargon. Here’s the executive summary (you know, that supposedly-digestible part) of one such piece.

“Management accountants are positioned to play a key role in the implementation and application of business analytics in their organizations as they move beyond traditional, transaction-based accounting to analytics. This emerging trend will transform how management accountants analyze and interpret data for their companies.”

So. Many. Words.

Let’s try to break it down, starting from somewhere concrete and building up to what data analytics means in managerial accounting.

Companies are in an information war. See Chapters 1 through 10 of this textbook. Managers really want to know their cost and revenue functions, so they can maximize profit. Maximize profit better than the next guy and you “win” the marketplace (if not, you may start to go out of business). So, channeling Sun Tzu, information dominance wins the day.
Because of computing advances, there are a lot more chrunch-able numbers to tell us something about the firm’s cost and revenue functions. These numbers include lots of data that is not traditional accounting information.
Managerial accountants (or general accountants that have some managerial accounting roles) live their life really close to the firm’s numbers already. So it’s often our job to do the crunching.*

*Or alternatively, a firm has separate data analytics roles. Still, accountants probably work closely with the data analysts who fill those roles (accountants often call up the data department to say, “Oh, you think crunching numbers is your ally. But you merely adopted the crunchiness. I was born in it, molded by it….”)

11.1.2 Why Now?

It’s not just that computing power in supercomputers has increased. The rise of data analytics is not as much caused by advances at the top of the computing power pyramid as it is caused by advances at the bottom. High-powered computing has reached the masses. Enormously powerful computers now sit in many peoples’ pockets, and these smartphones (as well as most modern, super-powered laptop and desktop computers) are connected almost instantaneously to the world through the internet.

“But wait,” you say, “I don’t use my smartphone to do data analytics. I just waste time on social media and take a million selfies (What should my caption be. I want it to be clever.). That’s it, this whole textbook is a lie!”

True, but you are creating data. Widespread computing power (among consumers and businesses alike) means that much more of life is digitally measured, thus creating data that is often analyzed by interested companies.

For example, every email you send creates an entry in a database with a plethora of metadata: originating email address, recipient email address(es), location, device email was sent from, time email was sent, subject line, length of email body, etc. And that’s without even analyzing the content of the body of the email (which is often done!).

Multiply that data creation event with many if not most of the things you do that involves a computer (including others’ computers, such as when you buy things at a store). You likely create a lot of data every day.

Why do companies care about all this data you’re creating? Because you’re part of their revenue function (either as a customer or potential customer). Or because you’re part of their cost function (perhaps in terms of your costliness as a customer or as an employee for the company or a related entity). My examples have focused on you creating data as a consumer, but the same goes for you creating data as an employee. Now that you’re creating so much data digitally, whatever your role, companies can do a lot of analysis to better understand your role in their profit function, analysis that wasn’t always possible or meaningful just a few years ago.

(Yes, as someone who has created neural networks, increased computing power does make a lot of complex analyses easier. But, more importantly I think, it has created mountains upon mountains of data that wasn’t available to be analyzed before.)

11.1.3 What Does it Mean to “Analyze” Data?

Data Analysis usually means extracting useful information from raw data by performing statistical operations on that raw data. These “statistical operations” can be very simple or very complex or anywhere in between.

Useful information is information that better informs one or more decision (this is very similar to the Chapter 7 discussion about how variance analysis should suggest action). Usually these decisions should match the Chapter 2 definition of a decision: the differing alternatives should differ in their likely cost and/or revenue.

Here’s an analogy.

There’s a board game called Guess Who?. In this board game, you and your opponent take turns asking yes-or-no questions about an unknown person. Only your opponent can see the person you are trying to guess, and only you can see the person your opponent is trying to guess. You’ve drawn this person randomly from from a common pool.

Your yes-or-no questions focus on the unknown person’s characteristics “Does this person have glasses?” “Does this person have red hair?” “Does this person watch 24 (i.e. can this person tolerate super-fakey, logic-less shows)?” Everyone in the common pool of people is on a separate flip-boards in front of you, and as the answers come in, you flip down the flip-boards that you’ve ruled out.

The winner is whoever correctly guesses the identity of his or her unknown person first.

SIDE NOTE: The dominant strategy for this game, mathematically-speaking, maximizes the information value of each question: halving the field of possibilities with each question (that’s the maximum informativeness of a yes-or-no question in this setting).
See, if all-but-one of the people on the board has an ironic beard, ironic glasses, and an ironic sense of superiority, then it’s not much help to ask, “Is this person a hipster?”

This is like data analysis. Each question in Guess Who? provides useful information because it narrows down an unknown identity, and guessing that identity helps you win the game. Each data analysis can help narrow down the reality of the business environment the business operates in, which in turn helps the business win the profitability war.

In short, data analysis provides us with evidence to help us discern what reality we are in (out of all the possible realities).

SIDE NOTE: Using more information economics lingo: given a partitioned set of possible realities, data analysis provides a signals that, with the help of Bayes theorem, help us refine and improve our likelihood judgments about which reality-partition we are in.

Some quick and varied examples (some strategic, some operational, some internal to the company, some external to the company):

Is the reality that there is profitable space to move downmarket/upmarket?
- Data about potential customer profiles, data about competitors, and data about suppliers can help answer this question.
Is the reality that customers need more/less personal contact?
- Data about customer satisfaction can help with this question (this data includes customer feedback, which can be expensive to process without modern automation).
Is the reality that employees are overworked or underutilized?
- Data about employee attrition, productivity, and sentiment can help answer this question.
Is the reality that the new website layout helps move customers through to checkout?
- Data about customers’ progress through the website can help answer this question.

11.2 Data Analytics in Practice

11.2.1 Step #1: Data Collection and Cleaning

11.2.1.1 Why Data Cleaning?

I must always be doing something right in my analysis, because I always get the same answers: #REF!, #N/A, and #VALUE!.

Oh wait, those are Excel error codes. These and other errors are very common, in part because data is not always clean.

Data cleaning is ensuring the data conform to the required parameters, such as type (date, string characters, numbers, floating decimal, etc.), completeness (each observation has data in each field), and correctness (the data in each observation matches reality).

11.2.1.2 Data Cleaning

This part of the data analytics process is not the prettiest, but it’s essential for later steps. Data cleaning can also be hard to describe. It’s fairly ad hoc. Yes, if there’s an obvious error, you can often trace it back to, say, a date that was typed into a field that should have had a whole number. But often it’s hard to trace back or you don’t get an error message.

Here are some common steps in data cleaning.

Verifying correct data types were entered into the required fields.
Discarding incomplete data, tagging incomplete data (for sensitivity analysis) or even investigating to determine what the missing data should be.
Checking (manually and maybe through sampling) the classification of commonly mistaken items. For example, one of my closing entries when I worked in practice was the “wrong cost center” transaction. We knew the most commonly mistaken cost centers and corrected them.
Regular tests of system validity (as simple as check figures or tying, as complex as computerized system audits).

The system as a whole can help you with data cleaning by preventing messy data in the first place. There is human error, so it is essential to provide adequate training about how data must be entered into the accounting system or other recording, and what the standards are (e.g. what is the threshold for something to be declared obsolete, at what point does an order trigger an accrual, etc.).

Also the system can force validation while data is being entered. If the system won’t allow a string variable in a number slot, then that is an error you won’t have to clean later.

11.2.2 Step #2 Data Analysis

11.2.2.1 Types of Data Analytics

A Journal of Accountancy article describes four types of data analytics, which is good enough for my purposes (I’ve put the descriptions of these categories into my own words).

Descriptive Analytics: Describes what happened. Includes sums, averages, percent changes, or other similar simple mathematical operations to describe past results.
Diagnostic Analytics: Examines why things happened. Includes variance analysis and statistical analyses that help isolate the effect of individual root causes.
Predictive Analytics: Predicts what will happen. Can include modeling the expected behavior of operations (e.g. Taguchi loss models), employees (e.g. learning curves), and customer (e.g. modeling the impact of word of mouth marketing).
Prescriptive Analytics: Helps inform what should be done. All the above help with this to some degree, but this specifically includes to optimizing predictive models to find the best way of moving forward.

The first two (#1 and #2) focus on data about the past and the latter two (#3 and #4) focus on forecasts of the future. This matches two of the three terms used to typically describe managerial accounting: control (focuses on accountability for past performance), planning (focuses on future performance), and decision making. All four types of analyses feed into decision making.

These analytics include some things managerial accountants have long performed (e.g. sums averages, percent changes, variance analysis, etc.) some things managerial accountants have become involved in more recently (e.g. Taguchi loss curves and learning curves), and some thing managerial accountants are not typically familiar with (e.g. most statistical analysis, complex models, optimizing models, etc.).

11.2.2.2 Mathematical & Statistical Tools

As promised earlier, data analytics can use very simple mathematical and statistical tools.

For example, gross profit is a form of very simple data analytics. It’s simply a difference (specifically the difference between net revenue and cost of goods sold), but it helps you make sense of what both of those numbers mean for the business as a whole. It’s such useful information, gross profit is obligatory in the income statement.

Also, data analytics can be very complicated. Modeling the future can incorporate many inputs, many interactions between those inputs, and complex non-linear effects those inputs and interactions are supposed to have on the eventual predicted outcome. Add in machine learning (to train these models based on prior data) and you end up with a lot of moving parts that can require a lot of advanced knowledge.

In between these two extremes we get the more common varieties of mathematical and statistical tools: variance analysis, descriptive statistics, percent change, t-tests, ANOVA, regressions, etc.

You can perform a number of these analyses in Excel. Here are some links that provide at least an introductory explanation.

Descriptive Statistics: https://www.excel-easy.com/examples/descriptive-statistics.html
t-tests: https://www.excel-easy.com/examples/t-test.html
Simple linear regression (can also do multiple linear regression): https://www.ablebits.com/office-addins-blog/2018/08/01/linear-regression-analysis-excel/
ANOVA in Excel: https://www.excel-easy.com/examples/anova.html
Mixed ANOVA in Excel: http://www.real-statistics.com/anova-random-nested-factors/two-factor-mixed-anova/

Once you get past the simple level of statistical analyses, though, you probably need to look for software specifically dedicated to statistics. Here are some examples.

These programs are not free, but universities often have agreements to let students use them.
- SPSS
- Stata
- SAS
- PowerBI
- MATLAB
- Mathematica
These programs are free (they’re “open,” which I guess is something that’s always being hyped out there).
- Gertl
- R

I tried to order each of the above lists in roughly order of difficulty/power (my ordering of PowerBI, MATLAB, and Mathematica is not necessarily a strong opinion). The more difficult it is to learn the more powerful the program tends to be.

SIDE NOTE: More powerful statistics software tends to be more customizable, and therefore, more difficult to learn. Customization means richness in the programming language or interface you have to learn in order to operate the program. Statistics is an applied math. To get the most accurate analysis you need to customize your statistical tests to match the practical reality of your data set.
For example, you need to run a different type of regression on a data set that has observations from various firms in just one year versus a data set that observations from various firms over multiple years.
The more advanced statistics packages are just programming languages with statistics presets and you program in the gaps. This requires (1) a knowledge of that programming language and (2) a knowledge of the underlying mechanics of statistical tests. There are no guardrails if you are programming the statistical test from scratch.

11.2.3 Step #3 Presentation of Results

If you complete a great data analysis, but no one ever comes to understand its implications, was it really a great data analysis?

Part of data analytics is presenting results. Presenting results does several things.

You probably can’t present everything, at least not with an equal level of emphasis. Thus you must make choices about the priorities of the data analysis. This forces you to think through the “so what?” of your results. It can refine your understanding of the analysis.
Presenting results allows you (prior to presentation) and your audience (after presentation) to critically evaluate the data analysis. There might be a mistake (hopefully not a huge one!), or there might be a statistical tool that you are currently misunderstanding. Presenting allows you to get at least indirect feedback.
Presenting results usually is a prerequisite before your analysis makes a difference for the firm. Your analysis must be presented to decision makers for it to affect their decisions. Data analysis that doesn’t affect decision making is just an expensive hobby.

How do you present data analysis? It’s possible, I guess, that the decision maker can read the raw analysis output. But that’s less likely, especially if you’re talking about communicating with a wide range of decision makers or a large number of joint decision makers. At least one of the people on the committee will have slept through statistics class (who wouldn’t!).

Here are some typical tools for presenting data analysis.

Tables. A well-organized and well-labelled table can help people see how row categories relate to column categories (and to understand the scale and units).
Figures. This includes charts and graphs. These can quickly convey what categories are more or less than each other and how different categories relate to each other is more easily done visually.
Interactive visualizations. For example, this might be a pie chart that allows you to drill own on different subcategories or filter results the pie chart if based on, in real-time. If a figure or table can be customized by the audience, it can be a very flexible way of getting right at decision makers’ questions. You don’t have to guess what decision makers want to know. They will do that on their own.