# RP-04 Field: Monitoring and Adjustment of Calibration Intervals for Mass Standards

N.Dupuis-Désormeaux, Senior Engineer - Gravimetry
January 2003

Amendment, in PDF format, 222 KB

1. Abstract

2. Rationale

3. Detailed Steps

3.1 Establishment of the Sample Size

3.7 Adjustment of the Calibration Interval

4. Summary - At a Glance

5. References

## Abstract

Measurement Canada recognizes that it is important to ensure that the standards calibration activities are well documented and monitored. This is especially pertinent given the private sector’s increasing use of quality assurance and quality control programs.

This document addresses exclusively one quality control aspect of Measurement Canada’s mass calibration activities: the monitoring and adjustment of calibration intervals.

Monitoring:

An evaluation mechanism is suggested to determine if the number of masses found to be “out-of-tolerance” at the end of their calibration cycle is statistically significant. This verification is done through the analysis of “as-found” values that have been accumulated.

A method for shortening or lengthening the calibration intervals is proposed when the above analysis reveals significant differences from expected results.

## Rationale

Various acceptance sampling methods are used to estimate the maximum (and minimum) number of defects to be (statistically) expected.

For the monitoring of mass calibration activities, two evaluation tools are most appropriate: a maximum likelihood estimate method and the method proposed in ISO 2859-1.

A) Given a known sample size, confidence level and reliability target, a maximum likelihood estimate method with the probability density function of the normal distribution is used. This is similar to the method of ISO 7966: 1993 (E) when used with proportions.

The seven steps involved in the method are as follows:

1) Establishment of the Sample Size
2) Computation of the “As-found” Values
3) Determination of Uncertainties
4) Establishment of Reliability Targets
5) Formulation of Statistical Hypotheses
6) Analysis of Data
7) Adjustment of the Calibration Interval

These steps will be explained in detail in the following section.

B) The ISO 2859-1 standard with multiple sampling plans (Table IV-4) can also be used.

Although the ISO 2859-1method offers a more advanced technique for computing acceptance quality levels, the benefits gained from using a more elaborate sampling plan seem to be limited at this time.

## Detailed Steps

3.1 Establishment of the Minimum Sample Size ‘N’

The determination of the minimum sample size is crucial to any statistical analysis since it defines the confidence that we will have in the results. The analysis is then considered conclusive only if all points recommended are sampled.

The maximum variance of the process, the maximum tolerable error between sample mean and true population average, and the confidence level must be known before the minimum sample size can be calculated.

The sample size is a function of:

a) the maximum variance of the data under study (σ2Max) ;
b) the maximum tolerable error (E) between the observed sample mean and the true population average; and
c) the confidence level (expressed as a Zα/2 value) with which we can say that the true population average is within ± E of the observed sample mean.

This relationship can be expressed as:

3.1.1 Maximum Variance of the Data (α2 Max)

The maximum variance α2 Max of the data is the spread of values of the correction from nominal of the “as-found” values. It is based on experience. It is the only variable in the above sample size equation that needs to be entered by the Gravimetric Specialist.

For example, say that the “as-found” values for the correction from nominal of 20 kg test weights fluctuate between +1.5 tolerance and -1.5 tolerance. Therefore, the range of “as-found” values observed can be expressed as |Max - min| = | 1.5 tolerance - -1.5 tolerance | = |1.5 + 1.5| tolerance = 3 tolerance. We treat this fluctuation as a rectangular distribution, and have

3.1.2 Maximum Tolerable Error (E)

The maximum tolerable error E is the maximum difference that we are willing to accept between the average value obtained for the sample and the “true” average of the population. In our case, we are gathering data on the correction from nominal value and thus will obtain an average value for this correction.

For mass calibrations, we have that the maximum expanded combined uncertainty 2ucombined_max must be smaller than 1/3 the applicable tolerance on that mass; therefore, the maximum ucombined_max (at 1σ) on the measurement is no greater than 1/6 tolerance.

It is reasonable to assume that we want our experimental average to be no farther than 1/6 tolerance from the true average of the population. Thus we set E = ucombined_max = 1/6 tolerance. However, when the calculated value of the combined uncertainty ucombined (see 3.3) is known, this value can be used instead of the maximum combined uncertainty ucombined_max , discussed above.

This means that we require a sample of “N” points (see 3.1.4) in order to ensure that the “true” average correction from nominal value is no greater than 1/6 tolerance away from the observed average correction from nominal.

3.1.3 Z-statistic (Z α/2) Corresponding to the Confidence Level

Finally, we set the confidence level at 85%.

This implies that if we want to be 85% certain that the true population average will be within ± E from the observed sample average, we must collect at least “N” points. Because we are considering both possibilities + E and -E, we use a two-sided Z-statistic Z α/2.

For a confidence level (c.l.) of 85%, we have α = 1-c.l. = 0.15, or α/2 = 0.075; to this α/2 corresponds a Zα/2 = Z0.075 = Z(p: 1- α/2) = Z(p: 0.925) = 1.44.

Please note that if we were to set the confidence level at 95% as for the Calibration Services Laboratory (CSL), the number of sampling points would double, which is not operationally justifiable for Field applications. If the confidence level were 95%, the Zα/2 value would be Z(p: 1- α/2) = Z(p: 0.975) = 1.96.

3.1.4 Minimum Sample Size Computation

The minimum sample size is therefore computed as follows :

Using the example in 3.1.1 for the 20kg test weights, we would then require that:

A number of N ≥ 56 is computed from the above. With N = 56, we therefore would have sufficient points to ensure that the confidence level is respected. It should be borne in mind that if fewer than 56 points are sampled then the confidence level will be directly affected.

If the confidence level were 95%, the above equation would become:

Please note that ISO 7966:1993(E) could also have been used to determine the sample size.

ISO 7966:1993(E)

Since c.l. is 85%, alpha risk is set at 0.15 and beta risk is set at 0.15. Hence, Zα = Zβ = Z(p: 1-0.15) = 1.038
Zp0 = Z0.05 = accept lot if fewer than 5% of standards are outside the control limits = Z(p: 1- 0.05) = 1.645
Zp1 = Z0.10 = reject lot if more than 10% of standards are outside the control limits = Z(p: 1- 0.10) = 1.282

USL = Upper Specification Limit = 2/3 Tolerance, LSL = Lower Specification Limit = -2/3 Tolerance because outside these values the mass will need to be adjusted.

S = σ within = random variability of each data point, which in our case corresponds to a maximum value of 1/6 of the tolerance = 0.17 tolerance. Please note that this parameter is different from the standard deviation expected in the population as described in section 3.1.1.

APL = Acceptable Process Level
----- USL - Zp0S = USL - 1.645 S = 2/3 Tol - 1.645 (0.17 Tol) = 0.3870 Tol
----- LSL + Zp0S = LSL + 1.645 S = -2/3 Tol + 1.645 (0.17 Tol) = - 0.3870 Tol
Masses with corrections smaller than ± 0.3870 tolerance away from nominal have an 85% chance of being accepted (and fewer than 5% will be over the ± 2/3 tolerance limit).

RPL = Rejectable Process Level
----- USL - Zp1S = USL - 1.282 S = 2/3 Tol - 1.282 (0.17 Tol) = 0.4487 Tol
----- LSL + Zp1S = LSL + 1.282 S = - 2/3 Tol + 1.282 (0.17 Tol)= -0.4487 Tol
Masses with corrections greater than ± 0.4487 tolerance away from nominal have an 85% chance of being rejected (and more than 10% will exceed the ± 2/3 tolerance limit).

N > { ((Zα + Z β )2 S 2) / (RPL - APL )2 } = { ((2.076)2 (0.17 Tol)2 ) / (0.0617)2 } = 33

Note that in this case, we have N>33 instead of the previously calculated N>56; however, the acceptance/rejection criteria with the above method are slightly different.

Note: If the sample size is smaller than 30 points the analytic techniques presented in this paper are inappropriate and should not be used.

3.2 Computation of the “As-found” Values

Full calibration procedures are found in RP-01 Field: Calibration Procedures for Standards of Mass

Using the calibration method described in RP-01FIELD, determine the “as-found” value of at least the same number of standards as the minimum sample size determined in 3.1.4 above (with the given example n ≥ N = 56).

The “as-found” values are obtained prior to cleaning or adjusting the weights.

3.3 Determination of Uncertainties

Full computation of the uncertainties for mass calibration activities can be found in RP-02FIELD Determination of Mass Calibration Values and Related Uncertainties.

Compute or note the associated uncertainty for the nominal value and class of the mass standard under evaluation. If this information is not available, then a maximum combined uncertainty (at 1σ) of 1/6 tolerance can be used; this is discussed in section 3.1.2.

3.4 Establishment of Reliability Targets

Parameters for analysis:
- Reliability target: 90% of population must fall within control limits
- Control Limits: ± b tolerance away from nominal
- Confidence Level in meeting target: 85%

The reliability target is set such that: at the end of their calibration cycle, 90% of the standards must fall within the control limits.

The control limits are set according to the assumption that we want the values for the correction from nominal to be within ± 2/3 tolerance from nominal when the masses return for calibration. This also accounts for an expanded uncertainty in the determination of the correction equal to ± 1/3 tolerance.

In other words, a mass is called a defect if, when it returns for calibration, its mass value is beyond the control limits of ± 2/3 tolerance from nominal. The number of total defects observed is called Xdefects.

The confidence level represents that Measurement Canada can be 85% certain that the analysis performed will adequately detect when more than 10% of the standards are outside the control limits at the end of their calibration period.

3.5 Formulation of Statistical Hypotheses

After the reliability target, control limits and confidence level have been established, a comparison criteria is defined via statistical hypotheses. The “as-found” data are then compared to these statistical hypotheses.

In hypothesis testing, a null hypothesis is formulated and compared against its alternative hypothesis.

In our case,

- the null hypothesis Ho is: no more than 10% of the population will be outside the control limits;
- its alternative hypothesis H1 is: more than 10% of the population will be outside the control limits.

Note: the population is the total number of active standards of the same nominal value, class and usage.

This can be written as:

Ho: X ≤ (10%) M out of control limits
H1: X > (10% ) M out of control limits
where X is the total number of defects within a population of M standards.

3.6 Analysis of Data

The “as-found” data are now compared against the statistical hypotheses. This step is crucial as it determines if the number of observed “good” points (within the control limits) is sufficient to not-reject the Ho hypothesis. Likewise, the number of observed points that are “out of control limits” can be used to test our hypotheses.

To determine if the null hypothesis is to be rejected in favor of the alternative hypothesis, we must calculate what is called a test statistic. Hence, we are now ready to compare our sample data to our expected results by means of the test statistic.

Technical note:

Using a maximum likelihood estimate and the probability density function of the binomial distribution we can estimate what fraction of items will still be in-tolerance at the end of the calibration period. Note that a Bernoulli trial is used since there are only two possible outcomes: either the points are within control limits or they are outside these limits. Note that when using the (cumulative) normal distribution instead of the (discrete) binomial distribution, a correction for continuity is necessary when the sample is small (n smaller than 30). Further, if the following conditions are met, the normal distribution can be used instead of the binomial distribution:

If n≥ 30 OR
If np ≥ 5 AND n(1-p) ≥ 5

where n is the number of points sampled and p is the probability of success(or failure) expected

The Z test statistic will be used in our analysis. It is based on the normal distribution. When using a normal approximation, it is necessary to know the sample mean and standard deviation. Regarding proportions, the mean is expressed as μ = np and the standard deviation is expressed as σ = √npq, where n is the number of points sampled, p is the proportion of defects expected and q = 1- p is the proportion of non-defective items expected.

The Z statistic is expressed as:

where

X = the number of defects found in the sample
n = the total number of points sampled
p = the expected (target) probability of defects

The Z statistic is compared to the Zα value for expected results corresponding to the confidence level of our analysis. In fact we are computing a critical value for testing the null hypothesis by means of Zα against which the results are compared. If the calculated Z value falls within the acceptance region, the null hypothesis is not to be rejected.

As explained in the note above, the Z test statistic will be used in our analysis. It is based on the normal distribution. For a confidence level of 85%, α = 1-c.l. = 0.15; to this α corresponds a Zα value for a one-sided hypothesis of: Z0.15 = Z(p: 1-α) = Z(p: 0.85) = 1.038.

3.6.1 Maximum Allowable Number of Defects

The following equation is used to calculate the maximum allowable number of defects to be observed in the sample of “n” points:

Note: It is given that p = 10% = 0.1. This is because we want no more than 10% defects in the entire population.

If n=56 points are sampled, we have that, for a 85% confidence level: Xmax = 1.038/5.04 + 5.6 = 7.93

In other words, if 8 or fewer defects are observed for a sample of 56 points, we can be 85% certain that our decision to “not reject” the hypothesis H0 is a good choice and we can assume that no more than 10% of the total population of weights of the same nominal value will be outside the control limits.

The number of defects observed should be no greater than the value of Xmax above; otherwise, the calibration interval should be shortened according to section 3.7.

3.6.2 Minimum Number of Defects

The same process can be used to decide if the number of defects is lower and statistically significant from the 10% mark. In this case, we have:

Note: It is given that p = 10% = 0.1. This is because we want no more than 10% defects in the entire population.

If n=56 points are sampled, we have that, for a 85% confidence level: Xmin = -1.038/5.04 + 5.6 = 3.27

In other words, if 3 or fewer defects are observed for a sample of 56 points, we can be 85% certain in asserting that fewer than 10% of the total population of weights of the same nominal value will be outside the control limits.

If the number of defects observed is much smaller than Xmin above, perhaps the calibration interval should be lengthened. This must be discussed between the Gravimetric Specialist and the Senior Gravimetric Engineer; and if the interval needs to be lengthened, this shall be approved by Senior Management.

3.6.3 Examples

Here are two examples of the calculations involved:

Example 1: Reliability Target q = 90% , or p = 10%. In other words, 90% of the standards are within control limits at the end of their calibration cycle, and the number of points sampled is n = 56.

The above implies that the total number of defects observed Xdefects out of the 56 standards sampled must be larger than 3 and smaller than 8.

Example 2: Reliability Target q = 80% , or p = 20%. In other words, 80% of the standards are within control limits at the end of their calibration cycle, and the number of points sampled is n = 72.

The above implies that the total number of defects observed Xdefects out of the 72 standards sampled must be larger than 11 and smaller than 18.

The method used to determine the new calibration interval is based on an exponential reliability model and is similar to Method A-3 described in the document Establishment and Adjustment of Calibration Intervals, Recommended Practice RP-1, January 1996, produced by the National Conference of Standards Laboratories.

The calibration interval may need to be shortened (or lengthened) if the above analysis detects more (or fewer) defects than is statistically expected. The interval is then adjusted using an exponential reliability model as follows:

where

I1 = the revised calibration interval
I0 = the present calibration interval
R = the target (expected) reliability in terms of numbers of defects per population
= (maximum number of defects allowed) ÷ (total population of standards)
R0 = the observed reliability for the interval I0
= (number defects found at I0) ÷ (number of points sampled at I0)
= Xdefects / n

Calibration intervals can be shortened as required by using the above exponential reliability model after having discussed the issue with the Gravimetric Specialist. The Gravimetric Specialist shall inform the Senior Gravimetric Engineer when calibration intervals are shortened.

If data shows that the interval should be lengthened, the Gravimetric Specialist shall contact the Senior Gravimetric Engineer for possible action. Please note that lengthening calibration intervals requires modifications to the Weights and Measures Act and Regulations and should not take place unless approved by Senior Management.

## Summary - At a Glance

To re-establish calibration intervals, the following calculations must be made:

a) Select ‘n’ standards out of the total population of standards and compute their as-found value prior to cleaning and adjustment. Ensure that n>N, where ‘N’ is the minimum sample size, as detailed in section 3.1

b) Use the following equation to compute the maximum number of defects to be observed:

If the reliability target is 90% (q=90%, p=10%), the following is used:

c) Use the following equation to compute the minimum number of defects to be observed:

If the reliability target is 90% (q=90%, p=10%), the following is used:

d) Compute the number of standards with an “as-found” correction from nominal that is beyond the control limits of ± 2/3 tolerance. This number is the number of defects Xdefects.

e) If Xdefects > Xmax , discuss the issue with the Gravimetric Specialist who will calculate the new calibration interval using the following expression:

If Xdefects < Xmin , discuss the issue with the Gravimetric Specialist who will then contact the Senior Gravimetric Engineer to determine if action is necessary.

## References

1. Chao, Lincoln L., Introduction to Statistics, Brooks/Cole Publishing Company, Monterey, California, 1980.

2. ISO Standards Handbook, Statistical methods for quality control, vol. 1: Terminology and symbols, Acceptance sampling, fourth edition, 1995.

3. ISO Standards Handbook, Statistical methods for quality control, vol. 2: Measurement methods and results, Interpretation of statistical data, Process control, fourth edition, 1995.

4. Joint International Committee ISO/IEC/OIML/BIPM- TAG-4, Guide to the Expression of Uncertainty in Measurement, first edition, 1993.

5. Johnson, Robert, Elementary Statistics, third edition, North Scituate, Massachusetts: Duxbury Press, 1980.

6. Taylor, John K., and Oppermann, Henry V., Handbook for the Quality Assurance of Metrological Measurements, NBS Handbook 145, 1986.

7. Wonnacott, Thomas H., and Wonnacott, Ronald J., Introductory Statistics, New York: John Wiley & Sons, Inc., 1969.