Prototype 4

Using a Proportion to Compare Things

One more prototype is needed before a relationship can actually be assessed. We know how big (or how much, or how far) something is by comparing it to something familiar. For example, if we hear that someone weighs 250 pounds we think that's pretty big. We know that because the average weight of a person is about 160 pounds. But how much bigger is 250 than the average person. We divide 160 into 250 and find that it is 1.5625 and think the 250 person is about 1 and half times bigger. We might have done it the other way around and divided 250 in 160 and found that it was .64 and found that the average person is about 6/10ths or 64% the size of the large person (we get the 64% by multiplying 100 times .64).


In prototype # 4 we are going to compare prototype # 2 with prototype # 3 by the use of a proportion or ratio. Are the squares (squaring each number and adding them up) bigger than the products (multiplying the number in one set times the number in the other set) of the two sets. The degree to which the products are as large as the squares is the degree to which the two sets are related (this concept is key to understanding the general linear model). If we compute a ratio between those two results (sum of products and sum of squares), it in fact will indicate the relationship between those two sets of numbers.

Most statistics are concerned with a relationship between two or more sets of numbers. Consequently, the concept of a relationship between two or more sets of numbers is central to the concept of statistics. The prototypes that have been presented are all that is necessary for conceptual understanding but some added calculation are needed for a correlation, t-test or regression are known. Before the relationship between two sets of numbers can be determined both sets need to have a range and "anchor" point. The average or mean of the set is used for that anchor. The steps that were carried out in the previous sets will be performed on set below using the differences from the mean. The first set of numbers will be identified as X and the second set identified as Y.

Set A and set C are the same sets we have been working with Set B is X minus the mean (X - 3) or x (little x) and Set D is Y minus the mean of Y (Y - 3) or y (little y). Set E and Set F are the squares of little x and little y respectively. Set G is the product of the little y times little y.

It should be noted that "larger numbers multiplied by themselves getting larger faster" applies to "absolute values" (disregarding the signs) in this case. That can be seen where -2 times -2 is equal to 4, whereas -1 times -1 is 1. Remember squaring a set of numbers and adding them together will result in the largest possible result for that set of numbers. That is seen in little x squared and little y squared. Consequently, multiplying x times y and adding those together will indicate something about the relationship between the two sets. That can be done by comparing the result of (the sum of little x squared), (the sum of little y squared), and the (sum of little x times little y-- or sum of the cross products).

The formal method of making that comparison is called the Pearson Correlation Coefficient. It is accomplished by the forth prototype -- the ratio. In this case the two squared sets need to be averaged since there are two of them and only one of the cross products. If all problems were as simple as this one we could merely add 10 and 10 together and divide by 2 giving the result of 10. However, these numbers will usually be different and simple arithmetic would not take into account "large numbers produce larger number" we must multiply the sum of x2 time the sum of y2 and then take the square root of that. In this case the result is still 10. The final step is to divide this result into the sum of little xy (x times y) that is divide (producing a ratio) 10 by 10 the result is 1.00 indicating a perfect correlation. The formula that we have just worked out is:

Notice that the only changes made in the sums was the sum of xy. It has changed to 9 rather than 10. That will result in a lower correlation.

Another example is needed to get to a real world example. In this example the scale of the Y variable is changed while the correlation remains the same. A constant of 6 has been added to each of the numbers of the Y variable.


Notice how all of the absolute results all remain the same as the above example of the perfect correlation. However, the signs changes in the sum of xy. Consequently, you can see that it will now be a perfect negative correlation.