home


 * Correlations and Cross-Tabulation **

//__**Correlational Research **__//

Researchers try to determine the degree to which, or if at all, a relationship exists between two (or more) nonmanipulated variables.
Sometimes referred to as Associational Research

Can involve two quantitative variables, two categorical variables, or one quantitative and one categorical variable.
Examples: Reading Achievement (Q or C) and Interest Level in School (Q or C) Reading Achievement (Q or C) and Method Used for Reading Instruction (Q or C) Student Gender (Q or C) and College Major (Q or C) Mathematical Ability (Q or C) and Career Choice (Q or C)

Does not prove causation, however researchers often times make causal statements as a result of their study.
====Correlational Research --> relationships **(how)** --> become focus of future investigations --> investigations help us learn **why** variables are related --> detect patterns or connections between variables --> better understand the world in which we live ==== ====Because correlational research often times serves as a springboard to/foundation for additional research, a hypothesis which predicts existence of a relationship is very common. ====

The degree to which two variables are related is noted by using a correlation coefficient
Correlations are often categorized as Positive, Negative, or None. A **positive correlation** indicates a direct relationship between the variables, that is, high scores on one variable tend to be associatd with high scores on the other variable or low scores on one variable with low scores of the other. Examples below: Graphing the points in the above table would result in a scatterplot in quadrant 2 which emanates from lower left and travels to the upper right (/).
 * Education Level in Years || Starting Salary in Dollars ||
 * 11 || 18000 ||
 * 12 || 30000 ||
 * 16 || 40000 ||
 * 18 || 43000 ||
 * 20 || 50000 ||

A **negative correlation** indicates an indirect or inverse relationship exists between the variables. High scores on one variable are associated with low scores on the other variable, or low with high. Example below:
 * Weight in Pounds || Life Expectancy in Years ||
 * 200 || 80 ||
 * 250 || 73 ||
 * 300 || 68 ||
 * 350 || 60 ||
 * 400 || 52 ||

Graphing the points in this table would result in a scatterplot in quadrant 2 which emanates from the upper left and travels to the lower right (\).

====The parameters for the value of the coefficient of correlation are -1< r <+1 with -1 and +1 representing perfect negative and positive correlations respectively. The closer the correlation coefficient gets to 1 the stronger the positive correlation and the closer the correlation coefficient gets to -1 the stronger the negative correlation. Obviously, a correlation of "0" would be interpreted as no correlation exists between the measured variables. ==== ====<span style="font-family: Georgia,serif;">Squarring the correltaion coefficient (r <span style="font-family: Georgia,serif; vertical-align: super;">2 <span style="font-family: Georgia,serif;">) yields the percentage of the variability among the dependent variable scores that can be attributed to differences in the scores on the independent variable. ==== ====<span style="font-family: Georgia,serif;">For example: suppose a strong positive correlation exists between High School Grades and College GPA. Assume r=.65. Then it could be said that r<span style="font-family: Georgia,serif; vertical-align: super;">2 or 42% of the differences in college GPA's can be attributed to differences in students' High School grades.====

<span style="font-family: Georgia,serif;">Here are some suggested cut scores for Correlations. This info was taken from our "Fraenkel and Wallen" class text.

 * **r** || ** r<span style="font-family: Georgia,serif; vertical-align: super;">2 ** || **Interpretation** ||
 * .35 || 12% || Weak relationship, No use for predicting ||
 * .5 || 25% || Predictions can be made but will be very crude with huge errors. ||
 * .65 || 42% || Prediction said to be reasonably accurate ||
 * .85 || 72% || Close relationship between variables and very useful for predicting ||

There are different kinds of correlational procedures. The procedure chosen depends upon the type of variables that are involved in the research. For information on specific procedures, refer to our "Huck" class text, chapter 3, pp.58-67.


 * **Correlational Procedure** || **When to Use** ||
 * Pearson's Product-Moment || Two Quantitative Variables, Purpose to produce Raw Scores ||
 * Spearman's Rho (Rank-Order) || Two Quantitative Variables, Purpose is to Rank ||
 * Kendall's Tau || Two Quantitative Variables, Purpose is to Rank and Tie in rank exists ||
 * Point Biserial || One Quantitative Variable and One Qualitative & True Dichotomous Variable ||
 * Biserial || One Quantitative Variable and One Qualitative & Artificial Dichotomous Variable ||
 * Phi || Both variables are true dichotomies ||
 * Tetrachoric || Both variables are aritificial dichotomies ||
 * Cramer's V || Two Qualitative Variables ||

__//**<span style="font-family: Georgia,serif; font-size: 120%;">Cross Tabulation Research **//__

<span style="font-family: Georgia,serif;">Cross-tabulation is the process of creating a [|contingency table] from the multivariate [|frequency distribution] of [|statistical] variables. Heavily used in survey research, cross tabulations (or crosstabs for short) can be produced by a range of statistical packages, including some that are specialised for the task. Survey weights often need to be incorporated. Unweighted tables can be easily produced by some [|spreadsheets] and other [|business intelligence tools], where they are commonly known as [|pivot tables]. ( From Wikipedia, the free encyclopedia)

<span style="font-family: Georgia,serif;">**Purpose and Arrangement of Table.** Crosstabulation is a combination of two (or more) frequency tables arranged such that each cell in the resulting table represents a unique combination of specific values of crosstabulated variables. Thus, crosstabulation allows us to examine frequencies of observations that belong to specific categories on more than one variable. By examining these frequencies, we can identify relations between crosstabulated variables. Only categorical ([|nominal]) variables or variables with a relatively small number of different meaningful values should be crosstabulated. Note that in the cases where we do want to include a continuous variable in a crosstabulation (e.g., income), we can first recode it into a particular number of distinct ranges (e.g., low, medium, high). <span style="font-family: Georgia,serif;">2x2 Table. The simplest form of crosstabulation is the 2 by 2 table where two variables are "crossed," and each variable has only two distinct values. For example, suppose we conduct a simple study in which males and females are asked to choose one of two different brands of soda pop (brand A and brand B); the data file can be arranged like this:

<span style="font-family: Georgia,serif;">case 2 <span style="font-family: Georgia,serif;">case 3 <span style="font-family: Georgia,serif;">case 4 <span style="font-family: Georgia,serif;">case 5 <span style="font-family: Georgia,serif;">... || <span style="font-family: Georgia,serif;">MALE <span style="font-family: Georgia,serif;">FEMALE <span style="font-family: Georgia,serif;">FEMALE <span style="font-family: Georgia,serif;">FEMALE <span style="font-family: Georgia,serif;">MALE <span style="font-family: Georgia,serif;">... || <span style="font-family: Georgia,serif;">A <span style="font-family: Georgia,serif;">B <span style="font-family: Georgia,serif;">B <span style="font-family: Georgia,serif;">A <span style="font-family: Georgia,serif;">B <span style="font-family: Georgia,serif;">... ||
 * ~  ||~ <span style="font-family: Georgia,serif;">GENDER ||~ <span style="font-family: Georgia,serif;">SODA ||
 * ~ <span style="font-family: Georgia,serif;">case 1

<span style="font-family: Georgia,serif;">The resulting crosstabulation could look as follows.


 * ~  ||~ <span style="font-family: Georgia,serif;">SODA: A ||~ <span style="font-family: Georgia,serif;">SODA: B ||~   ||
 * ~ <span style="font-family: Georgia,serif;">GENDER: MALE || <span style="font-family: Georgia,serif;">20 (40%) || <span style="font-family: Georgia,serif;">30 (60%) || <span style="font-family: Georgia,serif;">50 (50%) ||
 * ~ <span style="font-family: Georgia,serif;">GENDER: FEMALE || <span style="font-family: Georgia,serif;">30 (60%) || <span style="font-family: Georgia,serif;">20 (40%) || <span style="font-family: Georgia,serif;">50 (50%) ||
 * || <span style="font-family: Georgia,serif;">50 (50%) || <span style="font-family: Georgia,serif;">50 (50%) || <span style="font-family: Georgia,serif;">100 (100%) ||

<span style="font-family: Georgia,serif;">Each cell represents a unique combination of values of the two crosstabulated variables (row variable Gender and column variable Soda), and the numbers in each cell tell us how many observations fall into each combination of values. In general, this table shows us that more females than males chose the soda pop brand A, and that more males than females chose soda B. Thus, gender and preference for a particular brand of soda may be related (later we will see how this relationship can be measured).

<span style="font-family: Georgia,serif;">**Marginal Frequencies.** The values in the margins of the table are simply one-way (frequency) tables for all values in the table. They are important in that they help us to evaluate the arrangement of frequencies in individual columns or rows. For example, the frequencies of 40% and 60% of males and females (respectively) who chose soda A (see the first column of the above table), would not indicate any relationship between Gender and Soda if the marginal frequencies for Gender were also 40% and 60%; in that case they would simply reflect the different proportions of males and females in the study. Thus, the differences between the distributions of frequencies in individual rows (or columns) and in the respective margins informs us about the relationship between the cross-tabulated variables.

<span style="font-family: Georgia,serif;">**Column, Row, and Total Percentages.** The example in the previous paragraph demonstrates that in order to evaluate relationships between cross-tabulated variables, we need to compare the proportions of marginal and individual column or row frequencies. Such comparisons are easiest to perform when the frequencies are presented as percentages.

From []