|
How to use data in your Crystal Ball model?
The following extensive tip is derived from the ModelAssist™ training tool from Vose Consulting™. Readers should consult the ModelAssist references (in the form of Mxxx) for more information.
Introduction
Do you have some data and are you wondering how to best use it in your risk analysis model to estimate model parameters? Should you just take the average or your data or should you actually do some statistical analysis? If so, what kind?
Statistics deal with estimating parameters from past data, for example the average of a population. Or in more formal language, statistics makes use of information from a sample (‘your data’) to draw conclusions (‘inferences’) about the population from which the sample was taken.
This Risk Analysis Tip will give you an introduction to the use of data in risk analysis, alternative ways of analyzing data for your Crystal Ball model and tips on how to avoid common errors.
Representing data in a Crystal Ball model
One common practice (which is actually a mistake) in risk analysis is to represent data by just taking an average. Or alternatively, statisticians often report their data analysis in a form of a best guess together with a confidence bound. For example, “average = 0.32 [95CI: 0.27, 0.37]".
Although both are common scientific and engineering practices, the output is inadequate for risk analysis needs because we need to have the entire uncertainty distribution from which to generate values. That is, a distribution of the uncertainty about a true mean of a population’s probability distribution.
Three ways
In risk analysis using Crystal Ball, there are three main approaches to statistics:
1. Classical statistics (M0081) is what we generally get taught at school and university: the z-test (M0165), t-test (M0166), chi-squared test (M0169 and M0168). All useful, but how often did we understand why? Mostly, we just got taught a set of procedures to follow, but it’s very important to understand the underlying assumptions when using classical statistics as the validity of the analysis greatly depends on the correctness of the assumptions.
2. The Bootstrap (M0444) is a particular classical technique that is becoming ever more popular with good reason. It requires far fewer assumptions than the more common classical methods and has the flexibility to answer many more questions. It also returns the same answers as classical statistics where the assumptions match.
3. Bayesian inference (M0052) is, in our view, the most powerful of all the methods presented. It has the ability to estimate many parameters from the same dataset simultaneously, it is explicit about its models and assumptions, it is easy to incorporate different datasets into the one estimate, and it is very intuitive. Some classical statisticians say that Bayesian inference is extremely subjective, and one could therefore come up with any answer one wished. We don’t agree with this, but acknowledge that each of the three methods has its own problems. In situations where all methods can be used on the same dataset they usually come up with exactly the same, or very nearly the same, answer.
To give you an idea of the range of statistical techniques available to estimate different parameters from a data set, we show the table below, drawn from ModelAssist (M0162). The table shows that to estimate parameters from a dataset, you can often use more than one method. In the example below, we show you how to determine and model the uncertainty distribution of the mean of a normally distributed parameter (see red circle).

An example
To give you an idea about how to use data in your Crystal Ball model to generate an uncertainty distribution, we will show it for the relatively simple situation where you assume that data comes from a Normal distribution (see red circle). In this case, the uncertainty distribution of the true mean is calculated from a Student-t distribution as follows:

Although this formula looks quite complicated, it really is not that difficult! You can represent the formula, and thus your uncertainty about the true mean, in your Crystal Ball model as shown in the following model.
> Download Model: Estimate mean and stdev for Normal distribution when neither known
Statistics fun?
Absolutely! Statistics is really not just about following rules. As you saw, there are a variety of methods for estimating uncertainty about model parameters and you need to decide when to use which approach.
An obvious question now is which one is best? The answer is, ‘it depends’ and we therefore encourage you to compare the results of your model when using two or more different estimating techniques. If the uncertainty distributions between two methods are significantly different and you cannot choose between them, you can consider this as another source of uncertainty and simply combine the two distributions, using a Discrete distribution (M0129).
What to do next?
A great deal of risk analysis problems can be tackled with knowledge of the building blocks of risk analysis and statistics - stochastic processes. ModelAssist for Crystal Ball from Vose Consulting contains hundreds of risk analysis topics and model templates.
> Learn more about ModelAssist for Crystal Ball
* The material within this ‘Risk Analysis Tip’ comes from one of the hundreds of topics available in ModelAssist for Crystal Ball. ModelAssist for Crystal Ball gives a more detailed explanation of the above methods and any risk analysis techniques involved. |