SSD Model Averaging

Introduction

The species sensitivity distribution (or SSD) is a cornerstone of modern inference in (statistical) ecotoxicology.

There are many shortcomings associated with SSDs but in the absence of a better, more credible and scientifically valid alternative, it’s the best we’ve got. As I wrote in a 2015 paper:

“It is also worthwhile mentioning an inconvenient truth of SSD modeling—it is not as objective as many would like to think, and indeed, it could be argued that it involves a substantially greater degree of subjectivity than the assessment factor approach”.

To see this, one need only consider the subjective decisions that the analyst makes at every stage of the SSD process, including, but not limited to, the following:

  • which test species to use;
  • ensuring representativeness of functional groups;
  • how many, if any, replicates in the concentration– response experiments;
  • how long to run the concentration–response tests;
  • what measure of toxicity to calculate;
  • how to “harmonize” different toxicity measures (e.g., arbitrary scaling of effective concentration data);
  • functional form fitted to the concentration–response data;
  • assumed probability model for the SSD;
  • method of estimating parameters of the SSD (e.g., moment estimators, maximum likelihood, resampling techniques, Bayesian estimators); and
  • making inference about the hazardous concentration (e.g., confidence intervals, credibility intervals, theoretically derived vs empirically based).

While simple in concept SSDs are fraught in practice. Two big issues are:

  1. sample sizes used to fit SSDs are woefully inadequate from a statistical perspective (eg: log-logistic is a two-parameter distribution and with n=5 this means degrees of freedom = 3);

  2. there is an implicit though nonetheless strong statistical assumption that the data used to fit the SSD represents a random sample from some larger population. This is patently false in ecotoxicology.

The effect of 1 is that virtually any ‘reasonable’ candidate probability model will not be rejected by standard goodness-of-fit tests.

This might not be such an issue if all we wanted to do was describe the distribution or compute a measure of location or dispersion. But, in ecotoxicology, we use the fitted distrbution to compute extreme percentiles and the results from different fitted distributions often vary substantially.

It’s easy to demonstrate this.

Example

We consider fitting various probability models to the following toxicity data:

13.265  8.273 21.223 16.236 17.021  3.285

The plots in Figure 1 show that only one of the 15 fitted distributions (3-parameter gamma) is rejected by goodness-of-fit tests.

Figure 1. Q-Q Plots for a variety of theoretical distributions

Figure 1. Q-Q Plots for a variety of theoretical distributions


Let’s look at some of the estimated quantiles.

Distribution \(H{{C}_{1}}\) \(H{{C}_{5}}\) \(H{{C}_{10}}\)
True value 0.868 4.276 6.092
Normal -1.89 2.54 4.90
Normal (Box-Cox) 1.15 1.85 2.21
Lognormal 2.30 3.67 4.71
Logistic -2.59 3.23 5.87
Log-logistic 2.57 4.54 5.87
Smallest extreme-value -6.39 1.57 5.07
Largest extreme-value 1.18 3.70 5.25
Gamma 2.26 4.00 5.26
Exponential 0.133 0.678 1.393
Exponential (2-parameter) 0.120 0.611 1.256
Weibull 2.197 4.327 5.836


Clearly, some of these are poor estimates even though the probability model was deemed to be adequate.

Given we have no theoretical basis for selecting one distribution over another, it makes sense to ‘combine’ (= weight) the results where the weights reflect our degree of confidence in the model fit.

Bayesian Approach

Source: David Fletcher, University of Otago

  • Unified approach to allowing for uncertainty
  • Model averaging natural
  • Priors for both parameters and models
  • Weighted combination of posteriors from different models
  • Weight: posterior probability of that model being true

Outline

\[\begin{align} & M=\text{ number of candidate models} \\ & y=\text{ response variable} \\ & \theta \text{ = parameter of interest} \\ \end{align}\]

What we’re interested in determining is \(p\left( \theta \left| y \right. \right)=\text{ posterior probability for }\theta \text{ given data}\)

It is readily shown that \(p\left( \theta \left| y \right. \right)=\sum\limits_{m=1}^{M}{p\left( m\left| y \right. \right)\ }p\left( \theta \left| y \right.,m \right)\) where: \[\begin{align} & p\left( m\left| y \right. \right)=\text{ posterior probability for model }m\text{ given data} \\ & p\left( \theta \left| y \right.,m \right)=\text{ posterior probability for }\theta \text{ given model }m\text{ and data} \\ \end{align}\]

Furthermore, \(p\left( m\left| y \right. \right)\propto p\left( m \right)p\left( y\left| m \right. \right)\) where

\[\begin{align} & p\left( m \right)=\text{ prior probability for model }m\text{ } \\ & p\left( y\left| m \right. \right)=\text{marginal likelihodd for model }m\text{ } \\ \end{align}\]

We can simulate from the required posterior probability density as follows:

  1. Select data-generating model at random from our set of models using prior model probabilities;
  2. Generate parameter values using specified prior distributions;
  3. Generate data from selected model with chosen parameter values;
  4. Repeat many times.

A weighted average is simply:

\[\bar{\theta }\text{=}\sum\limits_{m=1}^{M}{{{w}_{m}}}{{{\hat{\theta }}}_{m}}\text{ with }0<{{w}_{m}}<1;\text{ and }\sum\limits_{m=1}^{M}{{{w}_{m}}}=1\]

Question is: How do we choose the weights?

Frequentist Approach

A Frequentist approach is to base the weights on the likelihood of \(\text{mode}{{\text{l}}_{i}}\) relative to the ‘best’ fitting model.

Now, the AIC for the \({{i}^{th}}\) model is defined as:

\[AI{{C}_{i}}=-2\ln \left( {{L}_{i}} \right)+2{{k}_{i}}\]
where \({{L}_{i}}\) is the likelihood and \({{k}_{i}}\) is the number of parameters respectively.

Next, we form the differences:

\[{{\delta }_{i}}=AI{{C}_{i}}-AI{{C}_{0}}\]


where \(AI{{C}_{0}}\) is the AIC for the best-fitting model.

For simplicity, we’ll assume \({{k}_{i}}\equiv k\text{ }\forall \text{ }i\). Then it is straightforward to show that:

\[{{L}_{i}}={{L}_{0}}\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)\]

And so, relative weights can be formed as:

\[{{w}_{i}}=\frac{{{L}_{i}}}{\sum\limits_{i=1}^{M}{{{L}_{i}}}}=\frac{{{L}_{0}}\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}{{{L}_{0}}\sum\limits_{i=1}^{M}{\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}}\]

Thus:

\[{{w}_{i}}=\frac{\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}{\sum\limits_{i=1}^{M}{\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}}\]

Let’s use R to do this ‘manually’.

require(fitdistrplus)
## Loading required package: fitdistrplus
## Loading required package: MASS
## Loading required package: survival
dat<-c(13.265,  8.273, 21.223, 16.236, 17.021 , 3.285)
dist<-list("norm","lnorm","logis","gamma")
fit<-lapply(dist,function(x) fitdist(data=dat,x ))
aic<-data.frame(dist=unlist(dist),AIC=NA)
aic$AIC<-unlist(lapply(fit,function(x) x$aic))
aic$delta<-aic$AIC-min(aic$AIC)
aic$wt<-exp(-0.5*aic$delta)/sum(exp(-0.5*aic$delta))
aic$HC1<-unlist(lapply(dist,function(x) do.call(paste("q",x,sep=""),list(0.01,fit[[1]][[1]][1],fit[[1]][[1]][2]))))
aic$HC5<-unlist(lapply(dist,function(x) do.call(paste("q",x,sep=""),list(0.05,fit[[1]][[1]][1],fit[[1]][[1]][2]))))
aic$HC10<-unlist(lapply(dist,function(x) do.call(paste("q",x,sep=""),list(0.10,fit[[1]][[1]][1],fit[[1]][[1]][2]))))
aic[,-1]<-round(aic[,-1],3)
aic
##    dist    AIC delta    wt     HC1    HC5    HC10
## 1  norm 42.381 0.000 0.363  -0.571  3.468   5.622
## 2 lnorm 44.520 2.139 0.125   0.565 32.080 276.304
## 3 logis 42.809 0.428 0.293 -14.018 -4.234   0.194
## 4 gamma 43.395 1.014 0.219   1.054  1.326   1.489

So, our model-averaged SSD (MASSD) is a weighted combination of the normal, log-normal, logistic, and gamma distributions in the following proportions: 0.363, 0.125, 0.293, and 0.219 respectively.

What about the estimated HCx values?

You might be tempted to think that the estimated HCx can be obtained by simply applying the weights to the separate HCx values obtained from each of the distributions used in forming the MASSD. THIS WOULD BE WRONG!

It’s a trivial task to demonstrate this. For example, the HC5 from our model-averaged fit is 5.47 whereas the weighted HC5 is 3.59 (see later). We can substitute these values back into our model-averaged SSD (calculations not shown) and determine the (theoretical) fraction affected:

## Model-averaged HC5 fraction affected = 5.00%
## Weighted HC5 fraction affected =  2.31%

Obtaining HCx values from a model-averaged SSD

Recall that an HCx is a quantile from the fitted probability model. Now, the SSD is in fact what statisticians refer to as a probability density function or pfd. We denote this as \({f_X}\left( x \right)\).

So, for example, the familiar normal pdf with \(mean = \mu\) and \(variance = \sigma^2\) is:\[{f_X}\left( x \right) = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{ – \frac{1}{2}{{\left( {\frac{{x – \mu }}{\sigma }} \right)}^2}}}\]

The cumulative distribution function or cdf is a mathematical expression (if it exists) corresponding to the cumulative probability \(P\left[ {X \le r} \right]\) where r is any number and is written:\[P\left[ {X \le r} \right] = {F_X}\left( r \right) = \int_{ – \infty }^r {{f_X}\left( x \right)} \;dx\]

Now, the \({p^{th}}\) quantile (let’s call it \({\xi _p}\)) is such that:\[P\left[ {X \le {\xi _p}} \right] = p\]

So, for example, the median (or \({\xi _{0.5}}\)) is found by finding the value \({{\xi _{0.5}}}\) such that:\[P\left[ {X \le {\xi _{0.5}}} \right] = 0.5\].

From our definition of \({F_X}\left( \cdot \right)\) this means solving the integral equation:\[\int_{ – \infty }^{{\xi _{0.5}}} {{f_X}\left( x \right)} \;dx = 0.5\]

Fortunately, there’s an easy way in R – we use the inverse cdf or quantile functions.

So, for example, if we wanted to find the \({5^{th}}\) percentile of a normal distribution having mean=10 and standard deviation=12 :

qnorm(0.05,10,12)
## [1] -9.738244

or the \({80^{th}}\) percentile from the same distribution:

qnorm(0.8,10,12)
## [1] 20.09945


Now, returning to our model averaged SSD.

As explained above, the model averaged SSD is just a weighted combination of k individual pdf’s. That is:

\[\begin{array}{*{20}{c}} {{f_Y}\left( y \right) = \sum\limits_{i = 1}^k {{w_i}{f_i}\left( y \right)} }&{\begin{array}{*{20}{c}} {{\rm{with}}}&{\begin{array}{*{20}{c}} {0 \le {w_i} \le 1;}&{{\rm{and}}} \end{array}}&{\sum\limits_{i = 1}^k {{w_i} = 1} } \end{array}} \end{array}\]

The model-averaged cumulative distribution function is similarly defined:

\[\begin{array}{*{20}{c}} {{F_Y}\left( y \right) = \sum\limits_{i = 1}^k {{w_i}{F_i}\left( y \right)} }&{\begin{array}{*{20}{c}} {{\rm{with}}}&{\begin{array}{*{20}{c}} {0 \le {w_i} \le 1;}&{{\rm{and}}} \end{array}}&{\sum\limits_{i = 1}^k {{w_i} = 1} } \end{array}} \end{array}\]

So, the correct way of determining a model-averaged HCx is to solve:\[{F_Y}\left( {H{C_x}} \right) = x\] for \({H{C_x}}\).

This is no longer a simple matter of invoking one of R's intrinsic functions – although it can still be done with a single line of R code.

This is done by finding the root of the equation \({F_Y}\left( {H{C_x}} \right) – x\). In other words, solving:

\[{F_Y}\left( {H{C_x}} \right) – x = 0\]

The R function which does this is called uniroot which has the following form:

uniroot(f, interval, ...,
        lower = min(interval), upper = max(interval),
        f.lower = f(lower, ...), f.upper = f(upper, ...),
        extendInt = c("no", "yes", "downX", "upX"), check.conv = FALSE,
        tol = .Machine$double.eps^0.25, maxiter = 1000, trace = 0)

where

argument description
f the function for which the root is sought.
interval a vector containing the end-points of the interval to be searched for the root.
lower, upper the lower and upper end points of the interval to be searched.
f.lower, f.upper the same as f(upper) and f(lower), respectively. Passing these values from the caller where they are often known is more economical as soon as f() contains non-trivial computations.
extendInt character string specifying if the interval c(lower,upper) should be extended or directly produce an error when f() does not have differing signs at the endpoints. The default, “no”, keeps the search interval and hence produces an error. Can be abbreviated.
check.conv logical indicating whether a convergence warning of the underlying uniroot should be caught as an error and if non-convergence in maxiter iterations should be an error instead of a warning.
tol the desired accuracy (convergence tolerance).
maxiter the maximum number of iterations.
trace integer number; if positive, tracing information is produced. Higher values giving more details.


To see how this works, consider the previous example.

Example (continued)

Our model-averaged SSD from the previous example was a combination of normal, log-normal, logistic, and gamma distributions with weights 0.363, 0.125, 0.293, and 0.219 respectively. We first need to extract the parameter estimates for each of these distributions:

param<-lapply(fit, function (x) x$estimate)  # extract the estimated parameters for each component distribution

## Next form the model-averaged SSD

massd<-function(x,p){
  
F<- 0.363*pnorm(x,param[[1]][1],param[[1]][2])+0.125*plnorm(x,param[[2]][1],param[[2]][2])
       + 0.293*plogis(x,param[[3]][1],param[[3]][2])+0.219*pgamma(x,param[[4]][1],param[[4]][2])
return(F-p) #  p is probability (eg: 0.05 for an HC5) - to be supplied       
}


So we now use the uniroot function to find the \({H{C_5}}\):

uniroot(massd,c(0,10),p=0.05)$root
## [1] 5.472128


Had we applied the weights to the individual\({H{C_5}}\) values, we would have obtained:

p<-0.05
 0.363*qnorm(p,param[[1]][1],param[[1]][2])+0.125*qlnorm(p,param[[2]][1],param[[2]][2])+
        0.293*qlogis(p,param[[3]][1],param[[3]][2])+0.219*qgamma(p,param[[4]][1],param[[4]][2])
## [1] 3.590141


Which is substantially in error being 2/3rd the correct value