For various complicated reasons the simulations to examine the operating characteristics of these three different models for meta-analysis and subgroup analysis in meta-analysis empirically will take some time to complete. I’m having a powerful enough computer built to cope with them in a reasonable time-frame.

However, there really is no reason not to just go ahead and disseminate the basic idea. It is statistically sound and should be uncontroversial.

It’s all about how we translate techniques used in single trials to meta-analysis (many single trials to be pooled). We can do this by accounting for (over-dispersion due to) heterogeneity between trials when calculating the standard error of the weighted average (pooled) result and, by analogy with ANOVA, extend the test for interaction to subgroup analysis in meta-analysis.

The trick is not collapsing trial level data when calculating the variance, which is what the fixed effect model does. Here’s how to fix it:

**Introducing the full variance model:**

1. Thompson & Sharp,* SIM*, 1999 compared the fixed effect, full variance and random effects models in a single meta-analysis with substantial heterogeneity between trials. They didn’t call it the full variance model, that’s the only bit I’ve done. They don’t like it, but they are wrong, for reasons appended to this (extract from my thesis).

2. The full variance model is what Richard Peto thinks the fixed effect model is, and it is what it should be. The fixed effect model gets the effect estimate (a simple inverse-variance weighted average) right but collapses trials when calculating the pooled variance. The full variance model reinstates the trial level when estimating the pooled variance to give a much more reasonable estimate of uncertainty where there is heterogeneity. That is, it includes both components of variance (uncertainty): that due to sampling error and that due to between-trial heterogeneity.

3. Under-dispersion is best ignored (it is most likely to due to chance or publication bias). Both random effects and full variance models default to the fixed effect model when data are very (or too) homogeneous, that is when between-trial heterogeneity is less than or equal to its degrees of freedom, *χ ^{2}_{het ≤ }df*.

4. David Spiegelhalter, *QSHC*, 2005, tried both the full variance model (aka inverse-variance weighted regression) and the random effects model for analysing over-dispersed audit data and concluded that both were fit for purpose but that the random effects model was a better reflection of reality. And he is right. For audit data.

5. But for experimental data? The true underlying (subgroup) effects that the random effects model tries to account for are often due to bias (a failure of the trial to reflect reality), and the smallest trials are well known to be the most prone to bias. So we must not over-model unexplained heterogeneity when the data are experimental, and thus a fixed effect assumption (aka a weighted average) may often provide a more useful summary.

6. But not such an obstinate fixed effect assumption that heterogeneity is simply ignored! Take any meta analysis and shuffle the trial estimates, keeping the fixed effect summary estimate the same. The confidence interval won’t move at all either, regardless of the amount of heterogeneity between trials. That cannot be right.

7. The full variance model corrects the pooled variance estimate of the fixed effect model by multiplying it by the over-dispersion factor *φ*=*χ ^{2}_{het}/df*). If data are under-dispersed the fixed effect model should be used.

**Summary & a simple proof**

The full variance model is what the fixed effect model is commonly assumed to be, as eloquently explained by Richard Peto in his many publications on the subject. It will be years before I can publish everything I want to on this, and it took me ten to believe that I had grasped this right. But I have. Here’s a simple proof (I am not a mathematician):

Find a binary subgroup:

1-do an F-test as in Sandercock, Parmar, Torri, Qian, *BJC*, 2002;

2-meta-analyse within groups using the fixed effect model, and

3-do a t-test for a difference between groups using the estimates from 2.

Repeat the second and third steps using the full variance model instead. This time, the two tests will give identical answers, as theoretically they should.

**Conclusions**

a) The pooled variance estimate of the fixed effect model is historically mis-specified. As a result, the underlying fixed effects assumption is extremely (and unrealistically) strong.

b) The full variance model corrects the mis-specification of the fixed effect model and is probably the best summary of experimental data with unexplained over-dispersion which may be due to bias.

c) The random effects model is probably the best model where data are thought to accurately reflect reality (observed heterogeneity known to reflect real world variation in effect).

d) Both random effects and full variance models default to the fixed effect model where data are under-dispersed.

The full variance model can be applied with a calculator to the existing literature if heterogeneity is reported or calculable.

**References & spreadsheet**

Sandercock, Parmar, Torri, Qian, *BJC*, 2002 and Thompson & Sharp, *SIM*, 1999. It is statistically sound. I worked it out because we were wrestling with that BJC paper at the same time as I was doing my MSc in statistics and I hit on using the F-test to correct the test for heterogeneity for within trial heterogeneity (an important adjustment for subgroup analysis in meta-analysis compared to within single trials) and the rest follows.

This spreadsheet is for meta-regression (categorical sub-grouping variables only) using the full variance model. Use and distribute freely, let me know if there are any errors. It includes a quick calculator to convert fixed effect results to the full variance model.

**Extract (critique of Thompson & Sharp, SIM, 1999):**

Discussing the multiplicative (“full variance”) model for heterogeneity, Thompson & Sharp note:

“*The idea that the variance of the estimated effect within each study should be multiplied by some constant has little intuitive appeal, and leads to the same dominance of large studies over smaller studies that has been criticised in the context of fixed effect meta-analysis**.”*

However, it is not clear that weighting by study size is inherently problematic. Not only does this weighting reflect the amount of information contributed by each trial, it is negatively associated with bias and positively associated with study quality and generalisability.

This approach is also computationally simple, may be calculated using the output from standard meta-analysis software, and may be applied retrospectively to published studies with just a calculator. Whilst the assumptions underlying the random effects model are undoubtedly reasonable, the full variance model is also appealing from a variety of perspectives.

The over-dispersion parameter, **=* ^{2}_{het}/df*), has no impact on the relative contribution of each trial to the pooled result, which is weighted solely by inverse-variance (ie study size). The only affect of is to adjust the pooled variance for over-dispersion, reflecting the additional uncertainty introduced by heterogeneity between trials.

This work by Josie Sandercock is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

One way to think of this is by analogy with ANOVA.

ANOVA is really weird. It tests for a difference in means by looking at the ratio of variances. How does that work then?

The two components of variance that ANOVA considers are that due to sample variation and that due to a difference between means.

If the component of variance due to differences between means (heterogeneity between trials) is large compared to that due to sampling error (inverse of the sum of the inverse-variances), then we can conclude that there is a large difference between means.

The t test and the F test will give identical answers if there are only two groups to compare (and so the t test can be applied). They are mathematically closely related, with the relationships between distributions being as follows:

chi-2 = normal-squared (df = n-1)

t = normal/chi-2 (df = n-2)

F = chi-2/chi-2 (df, df)

We can produce a simple inverse-variance weighted average to summarise the broad location of the treatment effect. When constructing confidence intervals around this we must include both components of the variance. The fixed effects model includes only one (sampling error).

The full variance model provides a simple fix for the fixed effects model. You can calculate it simply by multiplying the pooled variance estimate from the fixed effects model by the chi-2 for heterogeneity between trials divided by its degrees of freedom (the over-dispersion paramater, phi). The model is well described in Thompson & Sharp, SIM, 1999 (link in main article above).

For subgroup analysis in meta-analysis the test for interaction needs modification for exactly the same reasons. Rather than asking whether the amount of heterogeneity between groups of trials is large (the test for interaction) we need to ask whether the amount of heterogeneity between groups of trials is large compared to the amount of heterogeneity which remains within groups of trials (the F test).

This is because if there is a lot of heterogeneity overall, it all has to end up somewhere. You can easily end up with a highly significant test for interaction when the F test shows nothing at all. Conversely the F test can strengthen the evidence by essentially giving credit for homogeneity within subgroups as well as heterogeneity between them when testing the null hypothesis that there is no subgroup effect.

This is further described in Sandercock, Parmar, Torri & Qian, BJC, 2002 (link in main article above).