In this post, I will illustrate the use of prediction intervals for the

comparison of measurement methods. In the example a new spectral method

for measuring whole blood hemoglobin is compared with a reference

method.

But first, let’s start with discussing the large difference between a

confidence interval and a prediction interval.

##### Prediction interval versus Confidence interval

Very often a confidence interval is misinterpreted as a prediction

interval, leading to unrealistic “precise” predictions. As you will see,

prediction intervals (PI) resemble confidence intervals (CI), but the

width of the PI is by definition larger than the width of the CI.

Let’s assume that we measure the whole blood hemoglobin concentration in

a random sample of 100 persons. We obtain the estimated mean (Est_mean),

limits of the confidence interval (CI_Lower and CI_Upper) and limits of

the prediction interval (PI_Lower and PI_Upper):

(The R-code to do this is in the next paragraph)

Est_mean | CI_Lower | CI_Upper | PI_Lower | PI_Upper |
---|---|---|---|---|

140 | 138 | 143 | 113 | 167 |

A **Confidence interval** (CI) is an interval of good estimates of the

unknown true **population parameter**. About a 95% confidence interval

for the mean, we can state that if we would repeat our sampling process

infinitely, 95% of the constructed confidence intervals would contain

the true population mean. In other words, there is a 95% chance of

selecting a sample such that the 95% confidence interval calculated from

that sample contains the true population mean.

Interpretation of the 95% confidence interval in our example:

-The values contained in the interval [138g/L to 143g/L] are good

estimates of the unknown **mean** whole blood hemoglobin concentration

in the population. In general, if we would repeat our sampling process

infinitely, 95% of such constructed confidence intervals would contain

the true mean hemoglobin concentration.

A **Prediction interval** (PI) is an estimate of an interval in which**a future observation** will fall, with a certain confidence level,

given the observations that were already observed. About a 95%

prediction interval we can state that if we would repeat our sampling

process infinitely, 95% of the constructed prediction intervals would

contain the new observation.

Interpretation of the 95% prediction interval in the above example:

-Given the observed whole blood hemogblobin concentrations, the whole

blood hemogblobin concentration of a **new sample** will be between

113g/L and 167g/L with a confidence of 95%. In general, if we would

repeat our sampling process infinitely, 95% of the such constructed

prediction intervals would contain the new hemoglobin concentration

measurement.

Remark: Very often we will read the interpretation “The whole blood

hemogblobin concentration of a new sample will be between 113g/L and

167g/L with a **probability** of 95%.” (for example on__wikipedia__).

This interpretation is correct in the theoretical situation where the

parameters (true mean and standard deviation) are known.

##### Estimating a prediction interval in R

First, let’s simulate some data. The sample size in the plot above was

(n=100). Now, to see the effect of the sample size on the width of the

confidence interval and the prediction interval, let’s take a “sample”

of 400 hemoglobin measurements using the same parameters:

```
set.seed(123)
hemoglobin<-rnorm(400, mean = 139, sd = 14.75)
df<-data.frame(hemoglobin)
```

Although we don’t need a linear regression yet, I’d like to use the lm()

function, which makes it very easy to construct a confidence interval

(CI) and a prediction interval (PI). We can estimate the mean by fitting

a “regression model” with an intercept only (no slope). The default

confidence level is 95%.

-Confidence interval:

```
CI<-predict(lm(df$hemoglobin~ 1), interval="confidence")
CI[1,]
```

```
## fit lwr upr
## 139.2474 137.8425 140.6524
```

The CI object has a length of 400. But since there is no slope in our

“model”, each row is exactly the same.

-Prediction interval:

`PI<-predict(lm(df$hemoglobin~ 1), interval="predict") `

`## Warning in predict.lm(lm(df$hemoglobin ~ 1), interval = "predict"): predictions on current data refer to _future_ responses`

`PI[1,]`

```
## fit lwr upr
## 139.2474 111.1134 167.3815
```

We get a “warning” that “predictions on current data refer to future

responses”. That’s exactly what we want, so no worries there. As you

see, the column names of the objects CI and PI are the same.

Now, let’s visualize the confidence and the prediction interval.

The code below is not very elegant but I like the result (tips are

welcome :-))

```
library(ggplot2)
limits_CI <- aes(x=1.3 , ymin=CI[1,2], ymax =CI[1,3])
limits_PI <- aes(x=0.7 , ymin=PI[1,2], ymax =PI[1,3])
PI_CI<-ggplot(df, aes(x=1, y=hemoglobin)) +
geom_jitter(width=0.1, pch=21, fill="grey", alpha=0.5) +
geom_errorbar (limits_PI, width=0.1, col="#1A425C") +
geom_point (aes(x=0.7, y=PI[1,1]), col="#1A425C", size=2) +
geom_errorbar (limits_CI, width=0.1, col="#8AB63F") +
geom_point (aes(x=1.3, y=CI[1,1]), col="#8AB63F", size=2) +
scale_x_continuous(limits=c(0,2))+
scale_y_continuous(limits=c(95,190))+
theme_bw()+ylab("Hemoglobin concentration (g/L)") +
xlab(NULL)+
geom_text(aes(x=0.6, y=160, label="Predictionninterval",
hjust="right", cex=2), col="#1A425C")+
geom_text(aes(x=1.4, y=140, label="Confidenceninterval",
hjust="left", cex=2), col="#8AB63F")+
theme(legend.position="none",
axis.text.x = element_blank(),
axis.ticks.x = element_blank())
PI_CI
```

The width of the confidence is very small, now that we have this large

sample size (n=400). This is not surprising, as the estimated mean is

the only source of uncertainty. In contrast, the width of the prediction

interval is still substantial. The prediction interval has two sources

of uncertainty: the estimated mean (just like the confidence interval)

and the random variance of new observations.

##### Example: comparing a new with a reference measurement method.

A prediction interval can be useful in the case where a new method

should replace a standard (or reference) method. If we can predict well

enough what the measurement by the reference method would be, (given the

new method), than the two methods give similar information and the new

method can be used.

For example in (Tian, 2017) a new spectral method (Near-Infra-Red) to

measure hemoglobin is compared with a Golden Standard. In contrast with

the Golden Standard method, the new spectral method does not require

reagents. Moreover, the new method is faster. We will investigate

whether we can predict well enough, based on the measured concentration

of the new method, what the measurement by the Golden Standard would be.

(note: the measured concentrations presented below are fictive)

```
Hb<- read.table("https://usercontent.one/wp/rforbiostatistics.colmanstatistics.be/wp-content/uploads/2023/03/Hb.txt",
header = TRUE)
library(pander)
pander(head(Hb))
```

New | Reference |
---|---|

84.97 | 87.24 |

99.91 | 103.4 |

111.8 | 116.7 |

117 | 116.7 |

118.1 | 113.5 |

118.2 | 121.7 |

```
plot(Hb$New, Hb$Reference,
xlab="Hemoglobin concentration (g/L) - new method",
ylab="Hemoglobin concentration (g/L) - reference method")
```

##### Prediction Interval based on linear regression:

Now, let’s fit a linear regression model predicting the hemogoblin

concentrations measured by the reference method, based on the

concentrations measured with the new method.

```
fit.lm <- lm(Hb$Reference ~ Hb$New)
plot(Hb$New, Hb$Reference,
xlab="Hemoglobin concentration (g/L) - new method",
ylab="Hemoglobin concentration (g/L) - reference method")
cat ("Adding the regression line:")
```

Adding the regression line:

```
abline (a=fit.lm$coefficients[1], b=fit.lm$coefficients[2] )
cat ("Adding the identity line:")
```

Adding the identity line:

`abline (a=0, b=1, lty=2)`

If both measurement methods would exactly correspond, the intercept

would be zero and the slope would be one (=“identity line”, dotted

line).

Now let’s calculate the confidence interval for this linear regression.

```
CI_ex <- predict(fit.lm, interval="confidence")
colnames(CI_ex)<- c("fit_CI", "lwr_CI", "upr_CI")
```

And the prediction interval:

`PI_ex <- predict(fit.lm, interval="prediction")`

`## Warning in predict.lm(fit.lm, interval = "prediction"): predictions on current data refer to _future_ responses`

`colnames(PI_ex)<- c("fit_PI", "lwr_PI", "upr_PI")`

We can combine those results in one dataframe and plot both the

confidence interval and the prediction interval

```
Hb_results<-cbind(Hb, CI_ex, PI_ex)
pander(head(Hb_results))
```

New | Reference | fit_CI | lwr_CI | upr_CI | fit_PI | lwr_PI | upr_PI |
---|---|---|---|---|---|---|---|

84.97 | 87.24 | 90.58 | 87.39 | 93.76 | 90.58 | 82.14 | 99.02 |

99.91 | 103.4 | 103.8 | 101.4 | 106.2 | 103.8 | 95.66 | 112 |

111.8 | 116.7 | 114.4 | 112.6 | 116.2 | 114.4 | 106.4 | 122.4 |

117 | 116.7 | 119 | 117.4 | 120.5 | 119 | 111 | 126.9 |

118.1 | 113.5 | 120 | 118.5 | 121.5 | 120 | 112 | 127.9 |

118.2 | 121.7 | 120.1 | 118.6 | 121.6 | 120.1 | 112.1 | 128 |

Visualizing the CI (in green) and the PI (in blue):

```
plot(Hb$New, Hb$Reference,
xlab="Hemoglobin concentration (g/L) - new method",
ylab="Hemoglobin concentration (g/L) - reference method")
Hb_results_s <- Hb_results[order(Hb_results$New),]
lines (x=Hb_results_s$New, y=Hb_results_s$fit_CI)
lines (x=Hb_results_s$New, y=Hb_results_s$lwr_CI,
col="#8AB63F", lwd=1.2)
lines (x=Hb_results_s$New, y=Hb_results_s$upr_CI,
col="#8AB63F", lwd=1.2)
lines (x=Hb_results_s$New, y=Hb_results_s$lwr_PI,
col="#1A425C", lwd=1.2)
lines (x=Hb_results_s$New, y=Hb_results_s$upr_PI,
col="#1A425C", lwd=1.2)
abline (a=0, b=1, lty=2)
```

In (Bland, Altman 2003) it is proposed to calculate the average width of

this prediction interval, and see whether this is acceptable. Another

approach is to compare the calculated PI with an “acceptance interval”.

If the PI is inside the acceptance interval for the measurement range of

interest (see Francq, 2016).

In the above example, both methods do have the same measurement scale

(g/L), but the linear regression with prediction interval is

particularly useful when the two methods of measurement have different

units.

However, the method has some disadvantages:

- Predictions intervals are very sensitive to deviations from the

normal distribution. - In “standard” linear regression (or Ordinary Least Squares (OLS)

regression),the presence of measurement error is allowed for the

Y-variable (here, the reference method) but not for the X-variable

(the new method). The absence of errors on the x-axis is one of the

assumptions. Since we can expect some measurement error for the new

method, this assumption is violated here.

##### Taking into account errors on both axes

In contrast to Ordinary Least Square (OLS) regression, Bivariate Least

Square (BLS) regression takes into account the measurement errors of

both methods (the New method and the Reference method). Interestingly,

prediction intervals calculated with BLS are not affected when the axes

are switched (del Rio, 2001).

In 2017, a new R-package BivRegBLS was released. It offers several

methods to assess the agreement in method comparison studies, including

Bivariate Least Square (BLS) regression.

If the data are unreplicated but the variance of the measurement error

of the methods is known, The BLS() and XY.plot() functions can be used

to fit a bivariate Least Square regression line and corresponding

confidence and prediction intervals.

```
library (BivRegBLS)
Hb.BLS = BLS (data = Hb, xcol = c("New"),
ycol = c("Reference"), var.y=10, var.x=8, conf.level=0.95)
XY.plot (Hb.BLS,
yname = "Hemoglobin concentration (g/L) - reference method",
xname = "Hemoglobin concentration (g/L) - new method",
graph.int = c("CI","PI"))
```

Now we would like to decide whether the new method can replace the

reference method. We allow the methods to differ up to a given

threshold, which is not clinically relevant. Based on this threshold an

“acceptance interval” is created. Suppose that differences up to 10 g/L

(=threshold) are not clinically relevant, then the acceptance interval

can be defined as Y=X???, with ?? equal to 10. If the PI is inside the

acceptance interval for the measurement range of interest then the two

measurement methods can be considered to be interchangeable (see Francq,

2016).

The *accept.int* argument of the XY.plot() function allows for a

visualization of this acceptance interval

```
XY.plot (Hb.BLS,
yname = "Hemoglobin concentration (g/L) - reference method",
xname = "Hemoglobin concentration (g/L) - new method",
graph.int = c("CI","PI"),
accept.int=10)
```

For the measurement region 120g/L to 150 g/L, we can conclude that the

difference between both methods is acceptable. If the measurement

regions below 120g/l and above 150g/L are important, the new method

cannot replace the reference method.

##### Regression on replicated data

In method comparison studies, it is advised to create replicates (2 or

more measurements of the same sample with the same method). An example

of such a dataset:

```
Hb_rep <- read.table("https://usercontent.one/wp/rforbiostatistics.colmanstatistics.be/wp-content/uploads/2023/03/Hb_rep.txt",
header = TRUE)
pander(head(Hb_rep ))
```

New_rep1 | New_rep2 | Ref_rep1 | Ref_rep2 |
---|---|---|---|

88.25 | 94.98 | 90.17 | 84.01 |

109.1 | 109.2 | 106.9 | 98.17 |

114.6 | 122.5 | 116.2 | 118.2 |

120.2 | 114.6 | 114.5 | 104.3 |

108.4 | 110.7 | 116.5 | 108.7 |

125.9 | 124.4 | 112.6 | 112.4 |

When replicates are available, the variance of the measurement errors

are calculated for both the new and the reference method, and used to

estimate the prediction interval. Again, the BLS() function and the

XY.plot() function are used to estimate and plot the BLS regression

line, the corresponding CI and PI.

```
Hb_rep.BLS = BLS (data = Hb_rep,
xcol = c("New_rep1", "New_rep2"),
ycol = c("Ref_rep1", "Ref_rep2"),
qx = 1, qy = 1,
conf.level=0.95, pred.level=0.95)
XY.plot (Hb_rep.BLS,
yname = "Hemoglobin concentration (g/L) - reference method",
xname = "Hemoglobin concentration (g/L) - new method",
graph.int = c("CI","PI"),
accept.int=10)
```

It is clear that the prediction interval is not inside the acceptance

interval here. The new method cannot replace the reference method. A

possible solution is to average the repeats. The BivRegBLS package can

create prediction intervals for the mean of (2 or more) future values,

too!

In the plot above, averages of the two replicates are calculated and

plotted. I’d like to see the individual measurements:

```
plot(x=c(Hb_rep$New_rep1, Hb_rep$New_rep2),
y=c(Hb_rep$Ref_rep1, Hb_rep$Ref_rep2),
xlab="Hemoglobin concentration (g/L) - new method",
ylab="Hemoglobin concentration (g/L) - reference method")
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]),
y=as.numeric(Hb_rep.BLS$Pred.BLS[,2]),
lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]),
y=as.numeric(Hb_rep.BLS$Pred.BLS[,3]),
col="#8AB63F", lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]),
y=as.numeric(Hb_rep.BLS$Pred.BLS[,4]),
col="#8AB63F", lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]),
y=as.numeric(Hb_rep.BLS$Pred.BLS[,5]),
col="#1A425C", lwd=2)
lines (x=as.numeric(Hb_rep.BLS$Pred.BLS[,1]),
y=as.numeric(Hb_rep.BLS$Pred.BLS[,6]),
col="#1A425C", lwd=2)
abline (a=0, b=1, lty=2)
```

##### Remarks

- Although not appropriate in the context of method comparisonstudies, Pearson correlation is still frequently used. See Bland &

Altman (2003) for an explanation on why correlations are not

advised. - Methods presented in this blogpost are not applicable to time-series

##### References

- Confidence interval and prediction interval:

Applied Linear Statistical Models, 2005, Michael Kutner, Christopher

Nachtsheim, John Neter, William Li. Section 2.5

-Prediction interval for method comparison:

Bland, J. M. and Altman, D. G. (2003), Applying the right statistics:

analyses of measurement studies. Ultrasound Obstet Gynecol, 22: 85-93.

<doi:10.1002/uog.12> section: “Appropriate use of regression”.

Francq, B. G., and Govaerts, B. (2016) How to regress and predict in a

Bland-Altman plot? Review and contribution based on tolerance intervals

and correlated-errors-in-variables models. Statist. Med., 35: 2328-2358.

doi: 10.1002/sim.6872.

del Rio, F. J., Riu, J. and Rius, F. X. (2001), Prediction intervals in

linear regression taking into account errors on both axes. J.

Chemometrics, 15: 773-788. <doi:10.1002/cem.663>

-Example of a method comparison study:

H. Tian, M. Li, Y. Wang, D. Sheng, J. Liu, and L. Zhang, “Optical

wavelength selection for portable hemoglobin determination by

near-infrared spectroscopy method,” Infrared Phys. Techn 86, 98-102

(2017). doi.org/10.1016/j.infrared.2017.09.004.

-The predict() and lm() functions of R:

Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S.

Wadsworth & Brooks/Cole.