# SSD Model Averaging

*Professor David R. Fox*

*March 19, 2019*

# Introduction

The **species sensitivity distribution** (or SSD) is a cornerstone of modern inference in (statistical) ecotoxicology.

There are many shortcomings associated with SSDs but in the absence of a better, more credible and scientifically valid alternative, it’s the best we’ve got. As I wrote in a 2015 paper:

*“It is also worthwhile mentioning an inconvenient truth of SSD modeling—it is not as objective as many would like to think, and indeed, it could be argued that it involves a substantially greater degree of subjectivity than the assessment factor approach”.*

To see this, one need only consider the subjective decisions that the analyst makes at every stage of the SSD process, including, but not limited to, the following:

- which test species to use;
- ensuring representativeness of functional groups;
- how many, if any, replicates in the concentration– response experiments;
- how long to run the concentration–response tests;
- what measure of toxicity to calculate;
- how to “harmonize” different toxicity measures (e.g., arbitrary scaling of effective concentration data);
- functional form fitted to the concentration–response data;
- assumed probability model for the SSD;
- method of estimating parameters of the SSD (e.g., moment estimators, maximum likelihood, resampling techniques, Bayesian estimators); and
- making inference about the hazardous concentration (e.g., confidence intervals, credibility intervals, theoretically derived vs empirically based).

While simple in concept SSDs are fraught in practice. Two **big** issues are:

sample sizes used to fit SSDs are woefully inadequate from a statistical perspective (eg: log-logistic is a two-parameter distribution and with

*n=5*this means*degrees of freedom = 3*);there is an

*implicit*though nonetheless strong statistical assumption that the data used to fit the SSD represents a**random sample**from some larger population. This is patently false in ecotoxicology.

The effect of **1** is that virtually *any* ‘reasonable’ candidate probability model will not be rejected by standard goodness-of-fit tests.

This might not be such an issue if all we wanted to do was describe the distribution or compute a measure of location or dispersion. But, in ecotoxicology, we use the fitted distrbution to compute **extreme percentiles** and the results from different fitted distributions often vary substantially.

It’s easy to demonstrate this.

## Example

We consider fitting various probability models to the following toxicity data:

`13.265 8.273 21.223 16.236 17.021 3.285`

The plots in Figure 1 show that only **one** of the 15 fitted distributions (3-parameter gamma) is rejected by goodness-of-fit tests.

Let’s look at some of the estimated quantiles.

Distribution | \(H{{C}_{1}}\) | \(H{{C}_{5}}\) | \(H{{C}_{10}}\) |
---|---|---|---|

True value | 0.868 | 4.276 | 6.092 |

Normal | -1.89 | 2.54 | 4.90 |

Normal (Box-Cox) | 1.15 | 1.85 | 2.21 |

Lognormal | 2.30 | 3.67 | 4.71 |

Logistic | -2.59 | 3.23 | 5.87 |

Log-logistic | 2.57 | 4.54 | 5.87 |

Smallest extreme-value | -6.39 | 1.57 | 5.07 |

Largest extreme-value | 1.18 | 3.70 | 5.25 |

Gamma | 2.26 | 4.00 | 5.26 |

Exponential | 0.133 | 0.678 | 1.393 |

Exponential (2-parameter) | 0.120 | 0.611 | 1.256 |

Weibull | 2.197 | 4.327 | 5.836 |

Clearly, some of these are poor estimates even though the probability model was deemed to be adequate.

Given we have no theoretical basis for selecting one distribution over another, it makes sense to ‘combine’ (= weight) the results where the weights reflect our degree of confidence in the model fit.

# Bayesian Approach

*Source: David Fletcher, University of Otago*

- Unified approach to allowing for uncertainty
- Model averaging natural
- Priors for both parameters and models
- Weighted combination of posteriors from different models
- Weight: posterior probability of that model being true

### Outline

\[\begin{align} & M=\text{ number of candidate models} \\ & y=\text{ response variable} \\ & \theta \text{ = parameter of interest} \\ \end{align}\]

What we’re interested in determining is \(p\left( \theta \left| y \right. \right)=\text{ posterior probability for }\theta \text{ given data}\)

It is readily shown that \(p\left( \theta \left| y \right. \right)=\sum\limits_{m=1}^{M}{p\left( m\left| y \right. \right)\ }p\left( \theta \left| y \right.,m \right)\) where: \[\begin{align} & p\left( m\left| y \right. \right)=\text{ posterior probability for model }m\text{ given data} \\ & p\left( \theta \left| y \right.,m \right)=\text{ posterior probability for }\theta \text{ given model }m\text{ and data} \\ \end{align}\]

Furthermore, \(p\left( m\left| y \right. \right)\propto p\left( m \right)p\left( y\left| m \right. \right)\) where

\[\begin{align} & p\left( m \right)=\text{ prior probability for model }m\text{ } \\ & p\left( y\left| m \right. \right)=\text{marginal likelihodd for model }m\text{ } \\ \end{align}\]

We can simulate from the required posterior probability density as follows:

- Select data-generating model at random from our set of models using prior model probabilities;
- Generate parameter values using specified prior distributions;
- Generate data from selected model with chosen parameter values;
- Repeat many times.

A weighted average is simply:

\[\bar{\theta }\text{=}\sum\limits_{m=1}^{M}{{{w}_{m}}}{{{\hat{\theta }}}_{m}}\text{ with }0<{{w}_{m}}<1;\text{ and }\sum\limits_{m=1}^{M}{{{w}_{m}}}=1\]

#### Question is: How do we choose the weights?

# Frequentist Approach

A Frequentist approach is to base the weights on the likelihood of \(\text{mode}{{\text{l}}_{i}}\) relative to the ‘best’ fitting model.

Now, the AIC for the \({{i}^{th}}\) model is defined as:

\[AI{{C}_{i}}=-2\ln \left( {{L}_{i}} \right)+2{{k}_{i}}\]

where \({{L}_{i}}\) is the likelihood and \({{k}_{i}}\) is the number of parameters respectively.

Next, we form the differences:

\[{{\delta }_{i}}=AI{{C}_{i}}-AI{{C}_{0}}\]

where \(AI{{C}_{0}}\) is the *AIC* for the best-fitting model.

For simplicity, we’ll assume \({{k}_{i}}\equiv k\text{ }\forall \text{ }i\). Then it is straightforward to show that:

\[{{L}_{i}}={{L}_{0}}\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)\]

And so, relative weights can be formed as:

\[{{w}_{i}}=\frac{{{L}_{i}}}{\sum\limits_{i=1}^{M}{{{L}_{i}}}}=\frac{{{L}_{0}}\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}{{{L}_{0}}\sum\limits_{i=1}^{M}{\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}}\]

Thus:

\[{{w}_{i}}=\frac{\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}{\sum\limits_{i=1}^{M}{\exp \left( -\frac{1}{2}\,{{\delta }_{i}} \right)}}\]

#### Let’s use `R`

to do this ‘manually’.

`require(fitdistrplus)`

`## Loading required package: fitdistrplus`

`## Loading required package: MASS`

`## Loading required package: survival`

```
dat<-c(13.265, 8.273, 21.223, 16.236, 17.021 , 3.285)
dist<-list("norm","lnorm","logis","gamma")
fit<-lapply(dist,function(x) fitdist(data=dat,x ))
aic<-data.frame(dist=unlist(dist),AIC=NA)
aic$AIC<-unlist(lapply(fit,function(x) x$aic))
aic$delta<-aic$AIC-min(aic$AIC)
aic$wt<-exp(-0.5*aic$delta)/sum(exp(-0.5*aic$delta))
aic$HC1<-unlist(lapply(dist,function(x) do.call(paste("q",x,sep=""),list(0.01,fit[[1]][[1]][1],fit[[1]][[1]][2]))))
aic$HC5<-unlist(lapply(dist,function(x) do.call(paste("q",x,sep=""),list(0.05,fit[[1]][[1]][1],fit[[1]][[1]][2]))))
aic$HC10<-unlist(lapply(dist,function(x) do.call(paste("q",x,sep=""),list(0.10,fit[[1]][[1]][1],fit[[1]][[1]][2]))))
aic[,-1]<-round(aic[,-1],3)
aic
```

```
## dist AIC delta wt HC1 HC5 HC10
## 1 norm 42.381 0.000 0.363 -0.571 3.468 5.622
## 2 lnorm 44.520 2.139 0.125 0.565 32.080 276.304
## 3 logis 42.809 0.428 0.293 -14.018 -4.234 0.194
## 4 gamma 43.395 1.014 0.219 1.054 1.326 1.489
```

So, our *model-averaged SSD* **(MASSD)** is a weighted combination of the *normal, log-normal, logistic, and gamma* distributions in the following proportions: `0.363, 0.125, 0.293, and 0.219`

respectively.

**What about the estimated HC _{x} values?**

You might be tempted to think that the estimated HC_{x} can be obtained by simply applying the weights to the separate HC_{x} values obtained from each of the distributions used in forming the **MASSD**. *THIS WOULD BE WRONG!*

It’s a trivial task to demonstrate this. For example, the HC_{5} from our model-averaged fit is 5.47 whereas the *weighted* HC_{5} is 3.59 (see later). We can substitute these values back into our model-averaged SSD (calculations not shown) and determine the (theoretical) fraction affected:

`## Model-averaged HC5 fraction affected = 5.00%`

`## Weighted HC5 fraction affected = 2.31%`

### Obtaining HC_{x} values from a model-averaged SSD

Recall that an HC_{x} is a *quantile* from the fitted probability model. Now, the SSD is in fact what statisticians refer to as a *probability density function* or *pfd*. We denote this as \({f_X}\left( x \right)\).

So, for example, the familiar *normal pdf* with \(mean = \mu\) and \(variance = \sigma^2\) is:\[{f_X}\left( x \right) = \frac{1}{{\sigma \sqrt {2\pi } }}{e^{ – \frac{1}{2}{{\left( {\frac{{x – \mu }}{\sigma }} \right)}^2}}}\]

The *cumulative distribution function* or *cdf* is a mathematical expression (if it exists) corresponding to the **cumulative probability** \(P\left[ {X \le r} \right]\) where *r* is any number and is written:\[P\left[ {X \le r} \right] = {F_X}\left( r \right) = \int_{ – \infty }^r {{f_X}\left( x \right)} \;dx\]

Now, the \({p^{th}}\) quantile (let’s call it \({\xi _p}\)) is such that:\[P\left[ {X \le {\xi _p}} \right] = p\]

So, for example, the *median* (or \({\xi _{0.5}}\)) is found by finding the value \({{\xi _{0.5}}}\) such that:\[P\left[ {X \le {\xi _{0.5}}} \right] = 0.5\].

From our definition of \({F_X}\left( \cdot \right)\) this means solving the *integral equation*:\[\int_{ – \infty }^{{\xi _{0.5}}} {{f_X}\left( x \right)} \;dx = 0.5\]

Fortunately, there’s an easy way in `R`

– we use the *inverse cdf or quantile functions*.

So, for example, if we wanted to find the \({5^{th}}\) percentile of a normal distribution having *mean=10* and *standard deviation=12* :

`qnorm(0.05,10,12)`

`## [1] -9.738244`

or the \({80^{th}}\) percentile from the same distribution:

`qnorm(0.8,10,12)`

`## [1] 20.09945`

**Now, returning to our model averaged SSD.**

As explained above, the *model averaged SSD* is just a **weighted combination** of *k* individual *pdf’s*. That is:

\[\begin{array}{*{20}{c}} {{f_Y}\left( y \right) = \sum\limits_{i = 1}^k {{w_i}{f_i}\left( y \right)} }&{\begin{array}{*{20}{c}} {{\rm{with}}}&{\begin{array}{*{20}{c}} {0 \le {w_i} \le 1;}&{{\rm{and}}} \end{array}}&{\sum\limits_{i = 1}^k {{w_i} = 1} } \end{array}} \end{array}\]

The **model-averaged cumulative distribution function** is similarly defined:

\[\begin{array}{*{20}{c}} {{F_Y}\left( y \right) = \sum\limits_{i = 1}^k {{w_i}{F_i}\left( y \right)} }&{\begin{array}{*{20}{c}} {{\rm{with}}}&{\begin{array}{*{20}{c}} {0 \le {w_i} \le 1;}&{{\rm{and}}} \end{array}}&{\sum\limits_{i = 1}^k {{w_i} = 1} } \end{array}} \end{array}\]

So, the *correct* way of determining a model-averaged HC_{x} is to solve:\[{F_Y}\left( {H{C_x}} \right) = x\] for \({H{C_x}}\).

This is no longer a simple matter of invoking one of `R's`

intrinsic functions – although it can still be done with a single line of `R`

code.

This is done by finding the **root** of the equation \({F_Y}\left( {H{C_x}} \right) – x\). In other words, solving:

\[{F_Y}\left( {H{C_x}} \right) – x = 0\]

The `R`

function which does this is called `uniroot`

which has the following form:

```
uniroot(f, interval, ...,
lower = min(interval), upper = max(interval),
f.lower = f(lower, ...), f.upper = f(upper, ...),
extendInt = c("no", "yes", "downX", "upX"), check.conv = FALSE,
tol = .Machine$double.eps^0.25, maxiter = 1000, trace = 0)
```

*where*

argument | description |
---|---|

f | the function for which the root is sought. |

interval | a vector containing the end-points of the interval to be searched for the root. |

lower, upper | the lower and upper end points of the interval to be searched. |

f.lower, f.upper | the same as f(upper) and f(lower), respectively. Passing these values from the caller where they are often known is more economical as soon as f() contains non-trivial computations. |

extendInt | character string specifying if the interval c(lower,upper) should be extended or directly produce an error when f() does not have differing signs at the endpoints. The default, “no”, keeps the search interval and hence produces an error. Can be abbreviated. |

check.conv | logical indicating whether a convergence warning of the underlying uniroot should be caught as an error and if non-convergence in maxiter iterations should be an error instead of a warning. |

tol | the desired accuracy (convergence tolerance). |

maxiter | the maximum number of iterations. |

trace | integer number; if positive, tracing information is produced. Higher values giving more details. |

To see how this works, consider the previous example.

#### Example (*continued*)

Our model-averaged SSD from the previous example was a combination of *normal, log-normal, logistic, and gamma* distributions with weights `0.363, 0.125, 0.293, and 0.219`

respectively. We first need to extract the *parameter estimates* for each of these distributions:

```
param<-lapply(fit, function (x) x$estimate) # extract the estimated parameters for each component distribution
## Next form the model-averaged SSD
massd<-function(x,p){
F<- 0.363*pnorm(x,param[[1]][1],param[[1]][2])+0.125*plnorm(x,param[[2]][1],param[[2]][2])
+ 0.293*plogis(x,param[[3]][1],param[[3]][2])+0.219*pgamma(x,param[[4]][1],param[[4]][2])
return(F-p) # p is probability (eg: 0.05 for an HC5) - to be supplied
}
```

So we now use the `uniroot`

function to find the \({H{C_5}}\):

`uniroot(massd,c(0,10),p=0.05)$root`

`## [1] 5.472128`

Had we applied the weights to the individual\({H{C_5}}\) values, we would have obtained:

```
p<-0.05
0.363*qnorm(p,param[[1]][1],param[[1]][2])+0.125*qlnorm(p,param[[2]][1],param[[2]][2])+
0.293*qlogis(p,param[[3]][1],param[[3]][2])+0.219*qgamma(p,param[[4]][1],param[[4]][2])
```

`## [1] 3.590141`

*Which is substantially in error being 2/3rd the correct value*