An important assumption for process capability analysis is that the data be normally distributed. The Capability Combination Report (Individuals Nonnormal) allows you to transform the data to normality or utilize nonnormal distributions, including:
Box-Cox Transformation (includes an automatic threshold option so that data with negative values can be transformed)
Johnson Transformation
Distributions supported:
Automatic Best Fit based on AD P-Value
For technical details, see Appendix: Statistical Details for Nonnormal Distributions and Transformations. Also see Andrew Sleeper, Six Sigma Distribution Modeling, for further information on these methods.
Note that these transformations and distributions are particularly effective for inherently skewed data but should not be used with bimodal data or where the nonnormality is due to outliers (typically identified with a Normal Probability Plot). In these cases, you should identify the reason for the bimodality or outliers and take corrective action. Another common reason for nonnormal data is poor measurement discrimination leading to chunky data. In this case, attempts should be made to improve the measurement system.
SigmaXLs default setting is to use the Box-Cox transformation which is the most common approach to dealing with nonnormal data. Box-Cox is used to convert nonnormal data to normal by applying a power transformation, Y^lambda, where lambda varies from -5 to +5. You may select rounded or optimal lambda. Rounded is typically preferred since it will result in a more intuitive transformation such as Ln(Y) (lambda=0) or SQRT(Y) (lambda=0.5). If the data includes zero or negative values, select Lambda & Threshold. SigmaXL will solve for an optimal threshold which is a shift factor on the data so that all of the values are positive.
Clearly this is a process in need of improvement. To start, we would like to get a baseline process capability. The problem with using regular Capability analysis is that the results will be incorrect due to the nonnormality in the data. The Histogram and AD p-value < .05 clearly show that this data is not normal.
Notice the discrepancy between the Expected Overall (Theoretical) Performance and Actual (Empirical) Performance. This is largely due to the nonnormality in the data, since the expected performance assumes that the data is normal. So why not just use the actual performance and disregard the expected? This would not be reliable because the sample size, n = 30, is too small to estimate performance using pass/fail (discrete) criteria.
Also note that the process appears to be out-of-control on both the individuals and moving range charts.
The AD Normality P-Value Transformed
Data value of 0.404 confirms that the Box-Cox
transformation to normality was successful. The process capability
indices and expected performance can now be used to establish a
baseline performance. Note that there are no out-of-control signals
on the control charts, so the signals observed earlier when
normality was assumed were false alarms.
The Individuals Original Data chart displays the
untransformed data with control limits calculated as:
UCL = 99.865 percentile
CL = 50th percentile
LCL = 0.135 percentile
The benefit of displaying this chart is that one can observe the original untransformed data. Since the control limits are based on percentiles, this represents the overall, long term variation rather than the typical short term variation. The limits will likely be nonsymmetrical.
The Individuals/Moving Range Normalized Data chart displays the transformed z-values with control limits calculated using the standard Shewhart formulas for Individuals and Moving Range charts. The benefit of using this chart is that tests for special causes can be applied and the control limits are based on short term variation. The disadvantage is that one is observing transformed data on the chart rather than the original data.
Now we will redo the capability analysis using the Automatic Best Fit option.
The 2 Parameter Loglogistic distribution was selected as the best fit distribution. For
details on how this selection was made, see
Appendix: Statistical Details for Nonnormal
Distributions and Transformations.
The Anderson Darling statistic for the Loglogistic distribution is 0.245 which is less than the 0.37 value for the AD Normality test of the Box-Cox transformation indicating a better fit. (Note that published AD p-values for this distribution are limited to a maximum value of 0.25. The best fit selection uses a p-value estimate that is obtained by transforming the data to normality and then using a modified Anderson Darling Normality test on the transformed data).
Another helpful tool to evaluate transformations and distributions is Distribution Fitting