Using the Five Prototypes

That completes the 5 prototypes needed to understand most statistics, now we can add operations to them. Three different "sums of squares" (Prototypes #2 and #3) need to be understood and compared (using Prototype # 4). Particularly, "sums of squares total" (SST), "sums of squares between" (or sometimes called sums of squares regression) - (SSB or SSR), and "sums of squares error" (SSE). SSE was presented in the last Figure. Further it will be useful to then present three sums of squares by three different method (1) numerically, (2) geometrically, (3) as formulae, and finally (4) and Venn diagrams. You should recognize that these are four ways of presenting the same thing.


The three sums of squares (SST, SSB, and SSE) are the basis of the "general linear model." Creative distribution of the "sums of squares regression" among the variable can be used to assess many different hypotheses or models.
In each case (numerically, geometrically, formulae, and Venn diagramically) the above example will include SST, SSB (SSR), and SSE. At the same time I will "show my work" so that information needed for each calculation needed will also be given.

Table 2-3. Rows 1 through 7 are either mathematical notation or verbal description of mathematical calculations of the numbers in the column. Rows 8 through 12 are associated numbers involved the calculation. Row 13 is the sum of the numbers in the column while row 14 is the mean for the column. Row 15 is the usual verbal description of the sum in the column and row 16 is an abbreviation of that description.


B. Geometrically.

The geometric presentation of the model was started with Figure # 1 in the discussion of the prototypes but it was not completed (although the prototypes were completed). The "error sum of squares" was presented in Figure # 4; the "total sum of squares", and "between sum of squares" are presented in the next two figures.

Figure # 5. Distences from the mean -- total sum of squares (same as little y squared).

Figure 6.  Distances of difference between data points and regression lline.
Figure 7.  Distances between regression line and mean of Y

This section now gives the formulae and their names for a lot that is statistical. Think of it as learning a new vocabulary (not a set of formulas). Its a way of talking. You may use either the name or the formula. It will get you a long way. Only the standard deviation will be new from you have already covered.


These formulae will cover the essence of all of the statistics covered in this manual -- that is they will work of the intuitive genotype if not the actual statistic. The general linear model can be understood using this set. The statistics it will help you to understand are correlation, anova, (t-test), regression, multiple regression, manova, factor analysis, discriminant function, canonical analysis, and structural equation modeling.

We will next follow through with the above example so that you have a concrete reference to come back to. The are few numbers so that you can work in through easily.

All values of the formulas above are represented in this example. There are five observations of X (Raw Score X); therefore N = 5. Incidentally, there are also five observations of Y. The values of X are 1, 2, 3, 4, and 5. The sum of X is 15. Fifteen divided by 5 is 3 (sum of X divided by N) resulting in the mean of X -- ditto for Y.