ANOVA is really weird. It tests for a difference in means by looking at the ratio of variances. How does that work then?

The two components of variance that ANOVA considers are that due to sample variation and that due to a difference between means.

If the component of variance due to differences between means (heterogeneity between trials) is large compared to that due to sampling error (inverse of the sum of the inverse-variances), then we can conclude that there is a large difference between means.

The t test and the F test will give identical answers if there are only two groups to compare (and so the t test can be applied). They are mathematically closely related, with the relationships between distributions being as follows:

chi-2 = normal-squared (df = n-1)

t = normal/chi-2 (df = n-2)

F = chi-2/chi-2 (df, df)

We can produce a simple inverse-variance weighted average to summarise the broad location of the treatment effect. When constructing confidence intervals around this we must include both components of the variance. The fixed effects model includes only one (sampling error).

The full variance model provides a simple fix for the fixed effects model. You can calculate it simply by multiplying the pooled variance estimate from the fixed effects model by the chi-2 for heterogeneity between trials divided by its degrees of freedom (the over-dispersion paramater, phi). The model is well described in Thompson & Sharp, SIM, 1999 (link in main article above).

For subgroup analysis in meta-analysis the test for interaction needs modification for exactly the same reasons. Rather than asking whether the amount of heterogeneity between groups of trials is large (the test for interaction) we need to ask whether the amount of heterogeneity between groups of trials is large compared to the amount of heterogeneity which remains within groups of trials (the F test).

This is because if there is a lot of heterogeneity overall, it all has to end up somewhere. You can easily end up with a highly significant test for interaction when the F test shows nothing at all. Conversely the F test can strengthen the evidence by essentially giving credit for homogeneity within subgroups as well as heterogeneity between them when testing the null hypothesis that there is no subgroup effect.

This is further described in Sandercock, Parmar, Torri & Qian, BJC, 2002 (link in main article above).

]]>